Directory Name + File Name to Text File - python

I have many directories with files in them. I want to create a comma delimited txt file showing directory name, and the files that are within that particular directory, see example below:
What I'm looking for:
DirName,Filename
999,123.tif
999,1234.tif
999,abc.tif
900,1236.tif
900,xyz.tif
...etc
The python code below pushes a list of those file paths into a text file, however I'm unsure of how to format the list as described above.
My current code looks like:
Update
I've been able to format the text file now, however I'm noticing all the directory names/filenames are not being written to the text file. It is only writing ~4000 of the ~8000 dir/files. Is there some sort limit that I'm reaching with the text file number of rows, the list (mylist) size, or some bad file dir/file character that is stopping it (see updated code below)?
from os import listdir
from os.path import isfile, join
root = r'C:\temp'
mylist = ['2']
for path, subdirs, files in os.walk(root):
for name in files:
mylist.append(os.path.join(path, name))
txt = open(r'C:\temp\output.txt', 'w')
txt.write('dir' + ',' + 'file' + '\n')
for item in mylist:
list = mylist.pop(0)
dir, filename = os.path.basename(os.path.dirname(list)), os.path.basename(list)
txt.write(dir + ',' + filename + '\n')
with open(r'C:\temp\output.txt', 'r') as f:
read_data = f.read()
Thank You

Maybe this helps:
You could get the absolute file paths and then do the following
import os.path
p = "/tmp/999/123.tif"
dir, filename = os.path.basename(os.path.dirname(p)), os.path.basename(p)
Result:
In [21]: dir
Out[21]: '999'
In [22]: filename
Out[22]: '123.tif'
Also consider using csv module to write this kind of files.
import csv
import os.path
# You already have a list of absolute paths
files = ["/tmp/999/123.tif"]
# csv writer
with open('/tmp/out.csv', 'wb') as out_file:
csv_writer = csv.writer(out_file, delimiter=',')
csv_writer.writerow(('DirName','Filename'))
for f in files:
csv_writer.writerow((os.path.basename(os.path.dirname(f)),
os.path.basename(f)))

Related

How to batch process text files in path and create variables according to their file name

Lets say I have multiple text files in my path C:/Users/text_file/
I want to process them and set variables in loop for each processed text files in variables named after the filename.
To give an idea , if I have in text_file folder:
readfile_1.txt ,readfile_2.txt, readfile_3.txt, .....,....,.... ,readfile_n.txt
and i want to preprocess them with
with open(file_path, 'r', encoding='utf8') as f:
processed = [x.strip() for x in f]
I did
import glob, os
path = 'C:/Users/text_file/'
files = os.listdir(path)
print(len(files))
txtfiles={}
for file in files:
file_path = path+file
print('Processing...'+file_path)
with open(file_path, 'r', encoding='utf8') as f:
processed = [x.strip() for x in f]
txtfiles[file_path] = processed
for filename, contents in txtfiles.items():
print (filename, (contents))
But what I want with the loop is Variables with prefix cc i.e cc_readfile_1 , cc_readfile_2 and cc_readfile_3
so that whenever i call cc_readfile_1 or cc_readfile_2, the output is as it would be if done one by one i.e
with open(r'C:\Users\text_file\readfile_1.txt', 'r', encoding='utf8') as f:
cc_readfile_1 = [x.strip() for x in f]
print(readfile_1)
If you want to know why I need this , I have over 100 text files which I need to process and keep in variables in python notebook for further analysis. I do not want to execute the code 100 times renaming with different file names and variables each time.
you can use fstrings to generate the correct Key :
You will be able to access them in the dictionary
import glob, os
path = 'C:/Users/text_file/'
files = os.listdir(path)
print(len(files))
txtfiles={}
for file in files:
file_path = path+file
print('Processing...'+file_path)
with open(file_path, 'r', encoding='utf8') as f:
processed = [x.strip() for x in f]
txtfiles[f"cc_{file_path}"] = processed
for filename, contents in txtfiles.items():
print (filename, (contents))
Use a dictionary where the keys are the files' basenames without extension. There's no real point in adding a constant prefix (cc_).
So, for example, if the filename is readfile_1.txt then the key would simply be readfile_1
The value associated with each key should be a list of all of the (stripped) lines in the file.
from os.path import join, basename, splitext
from glob import glob
PATH = 'C:/Users/text_file'
EXT = '*.txt'
all_files = dict()
for file in glob(join(PATH, EXT)):
with open(file) as infile:
key = splitext(basename(file))[0]
all_files[key] = list(map(str.strip, infile))
Subsequently, to access the lines from readfile_1.txt it's just:
all_files['readfile_1']

python script to merge more than 200 very large csv very in just one

I have been trying to merge several .csv files form different subfolders (all with the same name) into one. I tried with R but I got the result of not enough memory for carry the process (it should merge more than 20 million of rows). I am now working with a python script to try to get it (see below) it has many columns too so I don't need all of them but also I dont know if can choose which columns to add to the new csv:
import glob
import csv
import os
path= 'C:\\path\\to\\folder\\where\\all\\files\\are-allowated-in-subfolders'
result = glob.glob('*/certificates.csv')
#for i in result:
#full_path = "C:\\path\\to\\folder\\where\\all\\files\\are-allowated-in-subfolders\\" + result
#print(full_path)
os.chdir(path)
i=0
for root, directories, files in os.walk(path, topdown=False):
for name in files:
print(name)
try:
i += 1
if i % 10000 == 0:
#just to see the progress
print(i)
if name == 'certificates.csv':
creader = csv.reader(open(name))
cwriter = csv.writer(open('processed_' + name, 'w'))
for cline in creader:
new_line = [val for col, val in enumerate(cline)]
cwriter.writerow(new_line)
except:
print('problem with file: ' + name)
pass
but it doesn't work, and neither return any error so at the moment I am completely stuck.
Your indentation is wrong, and you are overwriting the output file for each new input file. Also, you are not using the glob result for anything. If the files you want to read are all immediately in subdirectories of path, you can do away with the os.walk() call and do the glob after you os.chdir().
import glob
import csv
import os
# No real need to have a variable for this really
path = 'C:\\path\\to\\folder\\where\\all\\files\\are-allowated-in-subfolders'
os.chdir(path)
# Obviously, can't use input file name in output file name
# because there is only one output file for many input files
with open('processed.csv', 'w') as dest:
cwriter = csv.writer(dest)
for i, name in enumerate(glob.glob('*/certificates.csv'), 1):
if i % 10000 == 0:
#just to see the progress
print(i)
try:
with open(name) as csvin:
creader = csv.reader(csvin)
for cline in creader:
# no need to enumerate fields
cwriter.writerow(cline)
except:
print('problem with file: ' + name)
You probably just need to keep a merged.csv file open whilst reading in each of the certificates.csv files. glob.glob() can be used to recursively find all suitable files:
import glob
import csv
import os
path = r'C:\path\to\folder\where\all\files\are-allowated-in-subfolders'
os.chdir(path)
with open('merged.csv', 'w', newline='') as f_merged:
csv_merged = csv.writer(f_merged)
for filename in glob.glob(os.path.join(path, '*/certificates.csv'), recursive=True):
print(filename)
try:
with open(filename) as f_csv:
csv_merged.writerows(csv.reader(f_csv))
except:
print('problem with file: ', filename)
An r prefix can be added to your path to avoid needing to escape each backslash. Also newline='' should be added to the open() when using a csv.writer() to stop extra blank lines being written to your output file.

Python : How to call csv files from the directory based on the list of items?

I have a list of items and I want to call only those csv files from the directory whose names are similar to the items in the list. I am doing something like below but its reading all the files from the directory. Please suggest.
x_list = [ a,b,c,d]
files in directory = [a.csv, b.csv, c.csv, d.csv, e.csv, f.csv]
for files in os.listdir(dir + 'folder\\'):
file_name = files[:files.find('.')]
if file_name in x_list:
print(file_name)
From official doc : "The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order."
import glob
dirPath = "/home/.../"
pattern = dirPath + '*.csv'
for file_path in glob.glob(pattern):
print(file_path)
You can use regular expressions for which you'll need the re module.
import re
print(*(file for file in os.listdir() if re.match(r".*\.csv", file) and file[:-4] in x), sep='\n')
You can also use the filter function:
print(*filter(lambda file: re.match(r".*\.csv", file) and file[:-4] in x, os.listdir(), sep='\n')
You can just open the CSVs on your list: This is what you could do:
# a b c d are names of CSVs without .csv extension
my_csvs_names = 'a b c d'.split()
my_csvs_folder = 'c/path/to/my_folder'
for csv_name in my_csvs_names:
file_name = f'{my_csvs_folder}/{csv_name}.csv'
try:
with open(file_name, 'r') as f_in:
for line in f_in:
# do something with it
print(line)
except FileNotFoundError:
print(f'{file_name} does not exist')
In doing something, you can append values to a variable or whatever you are trying to achieve.

Is there a way to read n text files in a folder and store as n str variable?

I want to read N number of text files in a folder and store them as N number of variables. Note, input will just be folder path and number of text files in it may vary(so n).
Manually i do it like below code, which needs to be completely changed:
import os
os.chdir('C:/Users/Documents/0_CDS/fileread') # Work DIrectory
#reading file
File_object1 = open(r"abc","r")
ex1=File_object1.read()
File_object2 = open(r"def.txt","r")
ex2=File_object2.read()
File_object3 = open(r"ghi.txt","r")
ex3=File_object3.read()
File_object4 = open(r"jkl.txt","r")
ex4=File_object4.read()
File_object5 = open(r"mno.txt","r")
ex5=File_object5.read()
You can use python's built-in dict. Here I only give keys of each input as its filename, you can name them in anyway you like.
import os
path = 'Your Directory'
result_dict = {}
for root, dirs, files in os.walk(path):
for f in files:
with open(os.path.join(path,f), 'r') as myfile:
result_dict[f] = myfile.read()
If you are not interested in the file names and only the content and there are only files in the dir
from os import listdir
l = [open(f).read() for f in listdir('.')]

How to read a lot of txt file in specific folder using python

Please help me, i have some file txt in folder. I want to read and summary all data become one file txt. How can I do it with python.
for example :
folder name : data
file name in that folder : log1.txt
log2.txt
log3.txt
log4.txt
data in log1.txt : Size: 1,116,116,306 bytes
data in log2.txt : Size: 1,116,116,806 bytes
data in log3.txt : Size: 1,457,116,806 bytes
data in log4.txt : Size: 1,457,345,000 bytes
My expected output:
a file txt the result.txt and the data is : 1,116,116,306
1,116,116,806
1,457,116,806
1,457,345,000
Did you mean you want to read the contents of each file and write all of them in to a different file.
import os
#returns the names of the files in the directory data as a list
list_of_files = os.listdir("data")
lines=[]
for file in list_of_files:
f = open(file, "r")
#append each line in the file to a list
lines.append(f.readlines())
f.close()
#write the files to result.txt
result = open("result.txt", "w")
result.writelines(lines)
result.close()
If you are looking for size of file instead of the contents.
change the two lines :
f= open(file,"r")
lines.append(f.readlines())
to:
lines.append(os.stat(file).st_size)
File concat.py
#!/usr/bin/env python
import sys, os
def main():
folder = sys.argv[1] # argument contains path
with open('result.txt', 'w') as result: # result file will be in current working directory
for path in os.walk(folder).next()[2]: # list all files in provided path
with open(os.path.join(folder, path), 'r') as source:
result.write(source.read()) # write to result eachi file
main()
Usage concat.py <your path>
Import os. Then list the folder contents using os.listdir('data') and store it in an array. For each entry you can get the size by calling os.stat(entry).st_size. Each of these entries can now be written to a file.
Combined:
import os
outfile = open('result.txt', 'w')
path = 'data'
files = os.listdir(path)
for file in files:
outfile.write(str(os.stat(path + "/" + file).st_size) + '\n')
outfile.close()
You have to find all files that you are going to read:
path = "data"
files = os.listdir(path)
You have to read all files and for each of them to collect the size and the content:
all_sz = {i:os.path.getsize(path+'/'+i) for i in files}
all_data = ''.join([open(path+'/'+i).read() for i in files])
You need a formatted print:
msg = 'this is ...;'
sp2 = ' '*4
sp = ' '*len(msg) + sp2
print msg + sp2,
for i in all_sz:
print sp, "{:,}".format(all_sz[i])
If one needs to merge sorted files so that the output file is sorted too,
they can use the merge method from the heapq standard library module.
from heapq import merge
from os import listdir
files = [open(f) for f in listdir(path)]
with open(outfile, 'w') as out:
for rec in merge(*files):
out.write(rec)
Records are kept sorted in lexical order, if one needs something different merge accepts a key=... optional argument to specify a different ordering function.

Categories