I am trying to write a python 3.6 script that will add key/value pairs from a folder tree dictionary to a csv file. Files in the folder three are the keys and their paths are the values.
There seems to be an error in how I am iterating through the dictionary because in the csv file I only get the key/value pairs from one of the folders, and not the entire folder tree. I just don't see where my error is. Here is my code:
import os
import csv
root_dir = '.'
for root, dirs, files in os.walk (root_dir, topdown='true'):
folder_dict = {filename:root for filename in files}
print (folder_dict)
with open ('test.csv', 'w') as csvfile:
for key in folder_dict:
csvfile.write ('%, %s\n'% (key, folder_dict [key]))
I get the dictionary but in the csv file there are only the key/value pairs for one item.
Because of the line folder_dict = {filename:root for filename in files}, you overwrite the data on each loop, leaving the last dictionary as the only thing for the later write to the CSV.
You don't really need this interim data structure at all. Just write the CSV as you discover files to write. You weren't actually using the CSV module, so I added it to the solution.
import os
import csv
root_dir = '.'
with open ('test.csv', 'w') as fileobj:
csvfile = csv.writer(fileobj)
for root, dirs, files in os.walk (root_dir, topdown='true'):
csvfile.writerows((filename, root) for filename in files)
Related
I have a folder full of csv files that contain results for different participants in an experiment. I'm trying to create one large csv file containing all the csv files in the directory. I'm using listdir() to create a list of all the files but I haven't been able to open the individual files in this list. I think I need to loop over each file but I haven't been able to figure out how.
This is the code I've come up with so far. I get the following error: FileNotFoundError: [Errno 2] No such file or directory: 'results_262.csv' because it appears that the loop only reads one file in the files variable even though there should be many.
from os import listdir
path = "results"
files = listdir(path)
print(files)
results = open("results.csv", "w")
data = []
for file in files:
participant = open(f"{file}", "r")
data.append(participant.readlines())
Would anyone be able to help?
Your issue is that listdir() returns the filenames without their path information, so you need to append the filename to the path.
import os
...
for file in files:
participant = open(os.join(path, file), "r"):
...
Other details:
f"{file}" is the same thing as just file. I.e., open(file, "r") is equivalent to open(f"{file}", "r"), but more efficient - there is not need to use interpolation what you already have the value you want in a variable.
And, you're not closing your files. You should add participant.close() in your loop, or, even better, use a context manager:
for file in files:
with open(os.join(path, file), "r") as participant:
...
The with puts your file handle in a context manager, and the file gets closed automatically when you leave the scope of the context manager.
import os
for folderName, subFolders, fileNames in os.walk('C:\Home\Homework\Folders'):
for file in fileNames:
print(folderName, os.path.join(folderName, file))
Let's say that there are 3 folders Named One, Two, and Three that live in C:\Home\Homework\Folders.
I want a script that will create a list or table such as:
Folder
File Link
One
C:\Home\Homework\Folders\One\sample.pdf.
One
C:\Home\Homework\Folders\One\sample.txt
Two
C:\Home\Homework\Folders\Two\test1.pdf
Two
C:\Home\Homework\Folders\Two\test.csv
Three
C:\Home\Homework\Folders\Three\excel.xlsx
My end goal is to export a list to a CSV file.
As I mentioned in the comments, it's not clear to me whether you intend to walk subdirectories too. If so, you'll need a different solution.
I believe the following would achieve your task, however I did not test this code.
import os
import csv
# Open your output file in a context manager
with open("my_output.csv", "w", newline="") as output_file:
# Create a DictWriter instance using your output file and your desired header
writer = csv.DictWriter(output_file, fieldnames=("Folder", "File Link"))
# Write the header to the file
writer.writeheader()
# Walk the target directory
for folder_name, _, file_names in os.walk('C:\Home\Homework\Folders'):
for file_name in file_names:
file_link = os.path.join(folder_name, file_name)
# Construct a dict whose keys match the `fieldnames` from earlier
writer.writerow({"Folder": folder_name, "File Link": file_link})
Note the _ in the top-level for loop. This is because, as I understand your question, you don't seem to want to walk the subdirectories. Using _ as a variable name in Python is a conventional way of signifying to others reading your code that you do not care about the value and are not planning on using it anywhere. In this case _ will be a list of directory names (or empty if there are no subdirectories in a given directory from C:\Home\Homework\Folders).
For example, suppose your file structure is
C/
Home/
Homework/
Folders/
One/
sample.pdf
sample.txt
SomeSubDirectory/
some_file.py
You wouldn't find some_file.py because we're not walking SomeSubDirectory.
I've got 2 folders, each with a different CSV file inside (both have the same format):
I've written some python code to search within the "C:/Users/Documents" directory for CSV files which begin with the word "File"
import glob, os
inputfile = []
for root, dirs, files in os.walk("C:/Users/Documents/"):
for datafile in files:
if datafile.startswith("File") and datafile.endswith(".csv"):
inputfile.append([os.path.join(root, datafile)])
print(inputfile)
That almost worked as it returns:
[['C:/Users/Documents/Test A\\File 1.csv'], ['C:/Users/Documents/Test B\\File 2.csv']]
Is there any way I can get it to return this instead (no sub list and shows / instead of \):
['C:/Users/Documents/Test A/File 1.csv', 'C:/Users/Documents/Test B/File 2.csv']
The idea is so I can then read both CSV files at once later, but I believe I need to get the list in the format above first.
okay, I will paste an option here.
I made use of os.path.abspath to get the the path before join.
Have a look and see if it works.
import os
filelist = []
for folder, subfolders, files in os.walk("C:/Users/Documents/"):
for datafile in files:
if datafile.startswith("File") and datafile.endswith(".csv"):
filePath = os.path.abspath(os.path.join(folder, datafile))
filelist.append(filePath)
filelist
Result:
['C:/Users/Documents/Test A/File 1.csv','C:/Users/Documents/Test B/File 2.csv']
as the title would imply I am looking to create a script that will allow me to print a list of file names in a directory to a CSV file.
I have a folder on my desktop that contains approx 150 pdf's. I'd like to be able to have the file names printed to a csv.
I am brand new to Python and may be jumping out of the frying pan and into the fire with this project.
Can anyone offer some insight to get me started?
First off you will want to start by grabbing all of the files in the directory, then simply by writing them to a file.
from os import listdir
from os.path import isfile, join
import csv
onlyfiles = [f for f in listdir("./") if isfile(join("./", f))]
with open('file_name.csv', 'w') as print_to:
writer = csv.writer(print_to)
writer.writerow(onlyfiles)
Please Note
"./" on line 5 is the directory you want to grab the files from.
Please replace 'file_name.csv' with the name of the file you want to right too.
The following will create a csv file with all *.pdf files:
from glob import glob
with open('/tmp/filelist.csv', 'w') as fout:
# write the csv header -- optional
fout.write("filename\n")
# write each filename with a newline characer
fout.writelines(['%s\n' % fn for fn in glob('/path/to/*.pdf')])
glob() is a nice shortcut to using listdir because it supports wildcards.
import os
csvpath = "csvfile.csv"
dirpath = "."
f = open("csvpath, "wb")
f.write(",".join(os.listdir(dirpath)))
f.close()
This may be improved to present filenames in way that you need, like for getting them back, or something. For instance, this most probably won't include unicode filenames in UTF-8 form but make some mess out of the encoding, but it is easy to fix all that.
If you have very big dir, with many files, you may have to wait some time for os.listdir() to get them all. This also can be fixed by using some other methods instead of os.listdir().
To differentiate between files and subdirectories see Michael's answer.
Also, using os.path.isfile() or os.path.isdir() you can recursively get all subdirectories if you wish.
Like this:
def getall (path):
files = []
for x in os.listdir(path):
x = os.path.join(path, x)
if os.path.isdir(x): files += getall(x)
else: files.append(x)
return files
Filenames:
File1: new_data_20100101.csv
File2: samples_20100101.csv
timestamp is always = %Y%m%din the filename after a _ and before .csv
I want to find the files where there is a data and a samplesfile and then do something with those files:
My Code so far:
for all_files in os.listdir():
if all_files.__contains__("data_"):
dataList.append(all_files.split('_')[2])
if all_files.__contains__("samples_"):
samplesList.append(all_files.split('_')[1])
that gives me the filenames cut down to the Timestamp and the extension .csv
Now I would like to try something like this
for day in dataList:
if day in sampleList:
open day as csv.....
I get a list of days where both files have timestamps... how can I undo that files.split now so aI can go on working with the files since now I would get an error telling me that for instance _2010010.csvdoes not exist because it's new_data_2010010.csv
I'm kinda unsure on how to use the os.basename so I would appreciated some advice on the data names.
thanks
You could instead use the glob module to get your list. This allows you to filter just your CSV files.
The following script creates two dictionaries with the key for each dictionary being the date portion of your filename and the value holding the whole filename. A list comprehension creates a list of tuples holding each matching pair:
import glob
import os
csv_files = glob.glob('*.csv')
data_files = {file.split('_')[2] : file for file in csv_files if 'data_' in file}
sample_files = {file.split('_')[1] : file for file in csv_files if 'samples_' in file}
matching_pairs = [(sample_files[date], file) for date, file in data_files.items() if date in sample_files]
for sample_file, data_file in sorted(matching_pairs):
print('{} <-> {}'.format(sample_file, data_file))
For your two file example, this would display the following:
samples_20100101.csv <-> new_data_20100101.csv