Creating one csv file containing data from many csv files - python

I have a folder full of csv files that contain results for different participants in an experiment. I'm trying to create one large csv file containing all the csv files in the directory. I'm using listdir() to create a list of all the files but I haven't been able to open the individual files in this list. I think I need to loop over each file but I haven't been able to figure out how.
This is the code I've come up with so far. I get the following error: FileNotFoundError: [Errno 2] No such file or directory: 'results_262.csv' because it appears that the loop only reads one file in the files variable even though there should be many.
from os import listdir
path = "results"
files = listdir(path)
print(files)
results = open("results.csv", "w")
data = []
for file in files:
participant = open(f"{file}", "r")
data.append(participant.readlines())
Would anyone be able to help?

Your issue is that listdir() returns the filenames without their path information, so you need to append the filename to the path.
import os
...
for file in files:
participant = open(os.join(path, file), "r"):
...
Other details:
f"{file}" is the same thing as just file. I.e., open(file, "r") is equivalent to open(f"{file}", "r"), but more efficient - there is not need to use interpolation what you already have the value you want in a variable.
And, you're not closing your files. You should add participant.close() in your loop, or, even better, use a context manager:
for file in files:
with open(os.join(path, file), "r") as participant:
...
The with puts your file handle in a context manager, and the file gets closed automatically when you leave the scope of the context manager.

Related

get List of recently added .csv Files into the directory using python

I have a output files folder, where all the files get dumped, i need to check into the folder every five mins and pick up all the list of recently added files by using python.
One way of doing this is using sets, and get the non intersected files, is there any other better approach?
much appreciate the code snippet of it.
Thanks
To solve this, you can make use of the particular method listdir() from the os module and sleep() from the time module.
import os
from time import sleep
path = "/path/to/folder/with/csv/files"
with open("log.txt", "a+") as log_file:
while True:
log_file.seek(0)
existing = [f.strip() for f in log_file]
csvs = [f for f in os.listdir(path) if f.endswith(".csv") and f not in existing]
if len(csvs) > 0:
print(f"Found {len(csvs)} new file(s):")
for f in csvs:
print(f)
print("\n")
else:
print("Found 0 new files.")
log_file.writelines([f"{f}\n" for f in csvs])
sleep(300)
We will be storing the existing file names in a .txt file. You could use a .json file or any other file type you like. Firstly, we open the file using with/open (in append/read mode) and get a list of the file names that have previously been stored in the text file. We then get a list of all of the .csv files in that directory that are not in the file:
csvs = [f for f in os.listdir(path) if f.endswith(".csv") and f not in existing]
os.listdir() is lists all of the files and folders in the current working directory.
The following if/else statement is simply for output purposes and is not required. It is only saying: if new csv files were found, print how many and the names of each. If none were found, print that zero were found.
All that's left to do is write the newly discovered file names into the .txt file so that on the next iteration, they will be marked as existing and not new:
log_file.writelines([f"{f}\n" for f in csvs])
The final line, sleep(300), makes the program wait 300 seconds, or 5 minutes, to iterate again.

Python Delete Files in Directory from list in Text file

I've searched through many answers on deleting multiple files based on certain parameters (e.g. all txt files). Unfortunately, I haven't seen anything where one has a longish list of files saved to a .txt (or .csv) file and wants to use that list to delete files from the working directory.
I have my current working directory set to where the .txt file is (text file with list of files for deletion, one on each row) as well as the ~4000 .xlsx files. Of the xlsx files, there are ~3000 I want to delete (listed in the .txt file).
This is what I have done so far:
import os
path = "c:\\Users\\SFMe\\Desktop\\DeleteFolder"
os.chdir(path)
list = open('DeleteFiles.txt')
for f in list:
os.remove(f)
This gives me the error:
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'Test1.xlsx\n'
I feel like I'm missing something simple. Any help would be greatly appreciated!
Thanks
Strip ending '\n' from each line read from the text file;
Make absolute path by joining path with the file name;
Do not overwrite Python types (i.e., in you case list);
Close the text file or use with open('DeleteFiles.txt') as flist.
EDIT: Actually, upon looking at your code, due to os.chdir(path), second point may not be necessary.
import os
path = "c:\\Users\\SFMe\\Desktop\\DeleteFolder"
os.chdir(path)
flist = open('DeleteFiles.txt')
for f in flist:
fname = f.rstrip() # or depending on situation: f.rstrip('\n')
# or, if you get rid of os.chdir(path) above,
# fname = os.path.join(path, f.rstrip())
if os.path.isfile(fname): # this makes the code more robust
os.remove(fname)
# also, don't forget to close the text file:
flist.close()
As Henry Yik pointed in the commentary, you need to pass the full path when using os.remove function. Also, open function just returns the file object. You need to read the lines from the file. And don't forget to close the file. A solution would be:
import os
path = "c:\\Users\\SFMe\\Desktop\\DeleteFolder"
os.chdir(path)
# added the argument "r" to indicates only reading
list_file = open('DeleteFiles.txt', "r")
# changing variable list to _list to do not shadow
# the built-in function and type list
_list = list_file.read().splitlines()
list_file.close()
for f in _list:
os.remove(os.path.join(path,f))
A further improvement would be use list comprehension instead of a loop and a with block, which "automagically" closes the file for us:
with open('DeleteFiles.txt', "r") as list_file:
_list = list_file.read().splitlines()
[os.remove(os.path.join(path,f)) for f in _list]

taking data from files which are in folder

How do I get the data from multiple txt files that placed in a specific folder. I started with this could not fix. It gives an error like 'No such file or directory: '.idea' (??)
(Let's say I have an A folder and in that, there are x.txt, y.txt, z.txt and so on. I am trying to get and print the information from all the files x,y,z)
def find_get(folder):
for file in os.listdir(folder):
f = open(file, 'r')
for data in open(file, 'r'):
print data
find_get('filex')
Thanks.
If you just want to print each line:
import glob
import os
def find_get(path):
for f in glob.glob(os.path.join(path,"*.txt")):
with open(os.path.join(path, f)) as data:
for line in data:
print(line)
glob will find only your .txt files in the specified path.
Your error comes from not joining the path to the filename, unless the file was in the same directory you were running the code from python would not be able to find the file without the full path. Another issue is you seem to have a directory .idea which would also give you an error when trying to open it as a file. This also presumes you actually have permissions to read the files in the directory.
If your files were larger I would avoid reading all into memory and/or storing the full content.
First of all make sure you add the folder name to the file name, so you can find the file relative to where the script is executed.
To do so you want to use os.path.join, which as it's name suggests - joins paths. So, using a generator:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield f.read()
# this consumes the generator to a list
files_data = list(find_get('filex'))
See what we got in the list that consumed the generator:
print files_data
It may be more convenient to produce tuples which can be used to construct a dict:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield (relative_file_path, f.read(), )
# this consumes the generator to a list
files_data = dict(find_get('filex'))
You will now have a mapping from the file's name to it's content.
Also, take a look at the answer by #Padraic Cunningham . He brought up the glob module which is suitable in this case.
The error you're facing is simple: listdir returns filenames, not full pathnames. To turn them into pathnames you can access from your current working directory, you have to join them to the directory path:
for filename in os.listdir(directory):
pathname = os.path.join(directory, filename)
with open(pathname) as f:
# do stuff
So, in your case, there's a file named .idea in the folder directory, but you're trying to open a file named .idea in the current working directory, and there is no such file.
There are at least four other potential problems with your code that you also need to think about and possibly fix after this one:
You don't handle errors. There are many very common reasons you may not be able to open and read a file--it may be a directory, you may not have read access, it may be exclusively locked, it may have been moved since your listdir, etc. And those aren't logic errors in your code or user errors in specifying the wrong directory, they're part of the normal flow of events, so your code should handle them, not just die. Which means you need a try statement.
You don't do anything with the files but print out every line. Basically, this is like running cat folder/* from the shell. Is that what you want? If not, you have to figure out what you want and write the corresponding code.
You open the same file twice in a row, without closing in between. At best this is wasteful, at worst it will mean your code doesn't run on any system where opens are exclusive by default. (Are there such systems? Unless you know the answer to that is "no", you should assume there are.)
You don't close your files. Sure, the garbage collector will get to them eventually--and if you're using CPython and know how it works, you can even prove the maximum number of open file handles that your code can accumulate is fixed and pretty small. But why rely on that? Just use a with statement, or call close.
However, none of those problems are related to your current error. So, while you have to fix them too, don't expect fixing one of them to make the first problem go away.
Full variant:
import os
def find_get(path):
files = {}
for file in os.listdir(path):
if os.path.isfile(os.path.join(path,file)):
with open(os.path.join(path,file), "r") as data:
files[file] = data.read()
return files
print(find_get("filex"))
Output:
{'1.txt': 'dsad', '2.txt': 'fsdfs'}
After the you could generate one file from that content, etc.
Key-thing:
os.listdir return a list of files without full path, so you need to concatenate initial path with fount item to operate.
there could be ideally used dicts :)
os.listdir return files and folders, so you need to check if list item is really file
You should check if the file is actually file and not a folder, since you can't open folders for reading. Also, you can't just open a relative path file, since it is under a folder, so you should get the correct path with os.path.join. Check below:
import os
def find_get(folder):
for file in os.listdir(folder):
if not os.path.isfile(file):
continue # skip other directories
f = open(os.path.join(folder, file), 'r')
for line in f:
print line

How to open a list of files in Python

I'm reading data file (text), and generating a number of reports, each one is written to a different output file (also text). I'm opening them the long way:
fP = open('file1','w')
invP = open('inventory','w')
orderP = open('orders','w')
... and so on, with a corresponding group of close() lines at the end.
If I could open them with a for loop, using a list of fP names and file names, I could guarantee closing the same files.
I tried using a dictionary of fp:filename, but that [obviously] didn't work, because either the fP variable is undefined, or a string 'fP' isn't a good file object name.
Since these are output files, I probably don't need to check for open errors - if I can't open one or more, I can't go on anyway.
Is there any way to open a group of files (not more than 10 or so) from a list of names, in a loop?
Good news! Python 3.3 brings in a standard safe way to do this:
contextlib.ExitStack
From the docs:
Each instance maintains a stack of registered callbacks that are called in reverse order when the instance is closed.
(...)
Since registered callbacks are invoked in the reverse order of registration, this ends up behaving as if multiple nested with statements had been used with the registered set of callbacks.
Here's an example how to use it:
from contextlib import ExitStack
with ExitStack() as stack:
files = [
stack.enter_context(open(filename))
for filename in filenames
]
# ... use files ...
When the code leaves the with statement, all files that have already been opened will be closed.
This way you also know that if 2 files get opened and then third file fails to open, the two already-opened files will be closed correctly. Also if an exception is raised anytime inside the with block, you'll see correct cleanup.
Yes, you can use a list comprehension:
filenames = ['file1.txt', 'file2.txt', 'file3.txt'...]
filedata = {filename: open(filename, 'w') for filename in filenames}
Now, all of the opened instances are saved in filedata, assigned to the name of the file.
To close them:
for file in filedata.values():
file.close()
Since you are saying there are many data files.Instead of entering filenames manually into a list.You can get the filenames into a list with this.
from os import listdir
from os.path import isfile, join
files_in_dir = [ f for f in listdir('/home/cam/Desktop') if isfile(join('/home/cam/Desktop',f)) ]
Now you can
for file in files_in_dir:
with open(file, 'w') as f:
f.do_something
Use the with keyword to guarantee that opened files (and other similar resources, known as "context managers") are closed:
with open(file_path, 'w') as output_file:
output_file.write('whatever')
Upon exiting the with block, the file will be properly closed -- even if an exception occurs.
You could easily loop over a list of paths to the desired files:
files = ['fp1.txt', 'inventory', 'orders']
for file in files:
with open(file, 'w') as current_file:
current_file.do_some_stuff()
You can open as many files as you want and keep them in a list to close them later:
fds = [open(path, 'w') for path in paths]
# ... do stuff with files
# close files
for fd in fds:
fd.close()
Or you could use a dictionary for better readability:
# map each file name to a file descriptor
files = {path: open(path, 'w') for path in paths}
# use file descriptors through the mapping
files['inventory'].write("Something")
# close files
for path in files:
files[path].close()
Both answers above are good if you know or define ahead of time the list of files you will want to create. But, in case you want a more generic solution, you can build that list just in time, use your OS to create empty files on disk (this is done different ways depending on the OS you are), then create the list of files interactively this way:
import os
working_folder = input("Enter full path for the working folder/directory: ")
os.chdir(working_folder)
filenames_list = os.listdir()
#you can filter too, if you need so:
#filenames_list = [filename for filename in os.listdir() if '.txt' in filename]
#then you can do as Reut Sharabani and A.J. suggest above and make a list of file descriptors
file_descriptors = [open(filename, 'w') for filename in filenames_list]
#or a dictionary as Reut Sharabani suggests (I liked that one Reut :)
#do whatever you need to do with all those files already opened for writing
#then close the files
for fd in file_descriptors:
fd.close()
It is ok to use "with"; as some suggest, if you work with only one file at the time (from start to finish), but if you want to work with all the files at the same time, it is better a list or dictionary of file descriptors.

Walking sub directories in Python and saving to same sub directory

First of all thanks for reading this. I am a little stuck with sub directory walking (then saving) in Python. My code below is able to walk through each sub directory in turn and process a file to search for certain strings, I then generate an xlsx file (using xlsxwriter) and post my search data to an Excel.
I have two problems...
The first problem I have is that I want to process a text file in each directory, but the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt (would I use glob here?)
The second problem is that when I open/create an Excel I would like to save the file to the same sub directory where the .txt file has been found and processed. Currently my Excel is saving to the python script directory, and consequently gets overwritten each time a new sub directory is opened and processed. Would it be wiser to save the Excel at the end to the sub directory or can it be created with the current sub directory path from the start?
Here's my partially working code...
for root, subFolders, files in os.walk(dir_path):
if 'Textfile.txt' in files:
with open(os.path.join(root, 'Textfile.txt'), 'r') as f:
#f = open(file, "r")
searchlines = f.readlines()
searchstringsFilter1 = ['Filter Used :']
searchstringsFilter0 = ['Filter Used : 0']
timestampline = None
timestamp = None
f.close()
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Excel.xlsx', {'strings_to_numbers': True})
worksheetFilter = workbook.add_worksheet("Filter")
Thanks again for looking at this problem.
MikG
I will not solve your code completely, but here are hints:
the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt
you can list all files in directory, then check file extension
for filename in files:
if filename.endswith('.txt'):
# do stuff
Also when creating woorkbook, can you enter path? You have root, right? Why not use it?
You don't want glob because you already have a list of files in the files variable. So, filter it to find all the text files:
import fnmatch
txt_files = filter(lambda fn: fnmatch.fnmatch(fn, '*.txt'), files)
To save the file in the same subdirectory:
outfile = os.path.join(root, 'someoutfile.txt')

Categories