Trying to output directory listing to a csv file - python

I'm quite new to Python and programming in general, so hope you won't mind me asking a perhaps very basic question.
I'm using the following code to output a directory listing to excel.
import os
a = open('H:\output.csv', "w")
for path, subdirs, files in os.walk(r'.'):
for filename in files:
f = os.path.join(path, filename)
a.write(str(f) + os.linesep)
The problem is that some of the filenames are being cut off and instead of being entirely contained in column A, the last 6 or so characters are getting split into column B.
Additionally, there is a linebreak between each row, which ideally I would like to get rid of.
Lastly, I'd like to have a second column which contains only the filename, rather than the full path.

Try this. It encloses the file paths and file names each in double quote characters (") which I think might avoid the problem. I also had to specify an encoding for the output file because on my system paths are Unicode but the default mode for files is not.
import os
a = open('dirfiles.csv', "w", encoding='utf8')
for path, subdirs, files in os.walk(r'.'):
for filename in files:
f = os.path.join(path, filename)
a.write('"%s","%s"' % (f, filename) + os.linesep)

Related

Python to rename files in a directory/folder to csv

I have written a small script to hopefully iterate through my directory/folder and replace act with csv. Essentially, I have 11 years worth of files that have a .act extension and I just want to replace it with .csv
import os
files = os.listdir("S:\\folder\\folder1\\folder2\\folder3")
path = "S:\\folder\\folder1\\folder2\\folder3\\"
#print(files)
for x in files:
new_name = x.replace("act","csv")
os.rename(path+x,path+new_name)
print(new_name)
When I execute this, it worked for the first five files and then failed on the sixth with the following error:
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'S:\\folder\\folder1\\folder2\\folder3\\file_2011_06.act' -> 'S:\\folder\\folder1\\folder2\\folder3\\file_2011_06.csv'
When I searched for "S:\folder\folder1\folder2\folder3\file_2011_06.act" in file explorer, the file opens. Are there any tips on what additional steps I can take to debug this issue?
Admittedly, this is my first programming script. I'm trying to do small/minor things to start learning. So, I likely missed something... Thank you!
In your solution, you use string's replace to replace "act" by "csv". This could lead to problems if your path contains "act" somewhere else, e.g., S:\\facts\\file_2011_01.act would become S:\\fcsvs\\file_2011_01.act and rename will throw a FileNotFoundError because rename cannot create folders.
When dealing with file names (e.g., concatenating path fragments, extracting file extensions, ...), I recommend using os.path or pathlib instead of direct string manipulation.
I would like to propose another solution using os.walk. In contrast to os.listdir, it recursively traverses all sub-directories in a single loop.
import os
def act_to_csv(directory):
for root, folders, files in os.walk(directory):
for file in files:
filename, extension = os.path.splitext(file)
if extension == '.act':
original_filepath = os.path.join(root, file)
new_filepath = os.path.join(root, filename + '.csv')
print(f"{original_filepath} --> {new_filepath}")
os.rename(original_filepath, new_filepath)
Also, I'd recommend to first backup your files before manipulating them with scripts. Would be annoying to loose data or see it becoming a mess because of a bug in a script.
import os
folder="S:\\folder\\folder1\\folder2\\folder3\\"
count=1
for file_name in os.listdir(folder):
source = folder + file_name
destination = folder + str(count) + ".csv"
os.rename(source, destination)
count += 1
print('All Files Renamed')
print('New Names are')
res = os.listdir(folder)
print(res)

Python: Provide a string, open folder that contains the string in the folder name, then open/read a file in that folder

I have a highly branched folder structure as shown below. I want to match a barcode, which is nested between other descriptors and open/read a file of interest in that barcode's folder. The contents of each XXbarcodeXX folder are basically the same.
I have tried to use os.walk(), glob.glob(), and os.listdir() in combination with fnmatch, but none yielded the correct folder. and glob.glob() just returned an empty list which I think means it didnt find anything.
The closest of which I did not let finish bc it seemed to be going top down through each folder rather than just checking the folder names in the second level. This was taking far too long bc some folders in the third and fourth levels have hundreds of files/folders.
import re
path='C:\\my\\path'
barcode='barcode2'
for dirname, dirs, files in os.walk(path):
for folds in dirs:
if re.match('*'+barcode+'*', folds):
f = open(os.path.join(dirname+folds)+'FileOfInterest.txt', 'w')
The * in re.match regex you are using will probably generate an error (nothing to repeat at position 0) since is using a quantifier (zero or more times) without any preceding token. You may try to replace your regex with '..' + barcode + '..'. This regex will match your expected barcode string between any two characters (except for line terminators). In the command os.path.join you may join all the path's names and the desired file in the same command to avoid any issues with the specific OS separator.
import os
import re
path='dirStructure'
barcode='barcode2'
for dirname, dirs, files in os.walk(path):
for folds in dirs:
if re.match('..' + barcode + '..', folds):
f = open(os.path.join(dirname, folds, 'FileOfInterest.txt'), 'r')
print(f.readlines())
f.close

Save output of a script in txt (Windows)

I created a script to see all the files in a folder and print the full path of each file.
The script is working and prints the output in the Command Prompt (Windows)
import os
root = 'C:\Users\marco\Desktop\Folder'
for path, subdirs, files in os.walk(root):
for name in files:
print os.path.join(path, name)
I want now to save the output in a txt file so I have edited the code assigning the os.path.join(path,name) to a variable values but, when I print the output, the script gives me an error
import os
root = 'C:\Users\marco\Desktop\Folder'
for path, subdirs, files in os.walk(root):
for name in files:
values = os.path.join(path, name)
file = open('sample.txt', 'w')
file.write(values)
file.close()
Error below
file.write(values)
NameError: name 'values' is not defined
Try this:
file = open('sample.txt', 'w')
for path, subdirs, files in os.walk(root):
for name in files:
values = os.path.join(path, name)
file.write(values+'\n')
file.close()
Note that file is a builtin symbol in Python (which is overridden here), so I suggest that you replace it with fileDesc or similar.
The problem is the variable values is only limited to the scope of the inner for loop. So assign empty value to the variable before you start the iteration. Like values=None or better yet values='' Now assuming the above code even worked you wouldn't get the output file you desired. You see, the variable values is being regularly updated. So after the iteration the location of the last file encountered would be stored in values which would then be written in the sample.txt file.
Another bad practice you seem to be following is using \ instead of \\ inside strings. This might come to bite you later (if they haven't already). You see \ when followed by a letter denotes an escape sequence\character and \\ is the escape sequence for slash.
So here's a redesigned working sample code:
import os
root = 'C:\\Users\\marco\\Desktop\\Folder'
values=''
for path, subdirs, files in os.walk(root):
for name in files:
values = values + os.path.join(path, name) + '\n'
samplef = open('sample.txt', 'w')
samplef.write(values)
samplef.close()
In case you aren't familiar, '\n' denotes the escape sequence for a new-line. Reading your output file would be quite tedious had all your files been written been on the same line.
PS: I did the code with stings as that's what I would prefer in this scenario, but you way try arrays or lists or what-not. Just be sure that you define the variable beforehand lest you should get out of scope.
import os
root = 'C:\Users\marco\Desktop\Folder'
file = open('sample.txt', 'w')
for path, subdirs, files in os.walk(root):
for name in files:
values = os.path.join(path, name)
file.write(values)
file.close()
values is not defined to be availbale in the lexical scope of the file. It is scoped within a inner loop. change it as above, will work.

Python: Removing Leading Zeros in the Filename of Every File in a Folder and Subfolders

I would think that this is a basic task but most of my searches refer to adding zeros. I just want to strip the leading zeros from every file. I have files like "01.jpg, 02.jpg... 09.jpg".
Chapter 9 of Automate the Boring Stuff talks specifically about using python for this task but does not go over any examples of this.
import os, sys
for filename in os.listdir(os.path.dirname(os.path.abspath(__file__))):
Well that is the start of what I have so far.
import os, sys
for filename in os.listdir(file folder):
os.rename(filename, filename[1:])
It removes the first char from all file names (in your case, the '0' that you want to get rid of). I recommend using it with caution because it's irreversible.
Use this to get all files under your root folder:
files = []
for root, directories, file_names in os.walk(unicode(path)):
for filename in file_names:
files.append((filename, root))
return files
then simply iterate over the files and rename:
for f, p in files:
if f.startswith('0'):
os.rename(os.path.join(p, f), os.path.join(p, f[1:])

I am trying to write a Python Script to Print a list of Files In Directory

as the title would imply I am looking to create a script that will allow me to print a list of file names in a directory to a CSV file.
I have a folder on my desktop that contains approx 150 pdf's. I'd like to be able to have the file names printed to a csv.
I am brand new to Python and may be jumping out of the frying pan and into the fire with this project.
Can anyone offer some insight to get me started?
First off you will want to start by grabbing all of the files in the directory, then simply by writing them to a file.
from os import listdir
from os.path import isfile, join
import csv
onlyfiles = [f for f in listdir("./") if isfile(join("./", f))]
with open('file_name.csv', 'w') as print_to:
writer = csv.writer(print_to)
writer.writerow(onlyfiles)
Please Note
"./" on line 5 is the directory you want to grab the files from.
Please replace 'file_name.csv' with the name of the file you want to right too.
The following will create a csv file with all *.pdf files:
from glob import glob
with open('/tmp/filelist.csv', 'w') as fout:
# write the csv header -- optional
fout.write("filename\n")
# write each filename with a newline characer
fout.writelines(['%s\n' % fn for fn in glob('/path/to/*.pdf')])
glob() is a nice shortcut to using listdir because it supports wildcards.
import os
csvpath = "csvfile.csv"
dirpath = "."
f = open("csvpath, "wb")
f.write(",".join(os.listdir(dirpath)))
f.close()
This may be improved to present filenames in way that you need, like for getting them back, or something. For instance, this most probably won't include unicode filenames in UTF-8 form but make some mess out of the encoding, but it is easy to fix all that.
If you have very big dir, with many files, you may have to wait some time for os.listdir() to get them all. This also can be fixed by using some other methods instead of os.listdir().
To differentiate between files and subdirectories see Michael's answer.
Also, using os.path.isfile() or os.path.isdir() you can recursively get all subdirectories if you wish.
Like this:
def getall (path):
files = []
for x in os.listdir(path):
x = os.path.join(path, x)
if os.path.isdir(x): files += getall(x)
else: files.append(x)
return files

Categories