python count number of text file in certain directory - python

I am trying to get the number of the files in a certain directory, but I want it to count only text file because I have another directory in the accounts and DS. Store file. What should I modify to only get the number of text files?
list = os.listdir("data/accounts/")
number_files = len(list)
print(number_files)

From Count number of files with certain extension in Python,
Solution 1.
fileCounter = 0
for root, dirs, files in os.walk("data/accounts/"):
for file in files:
if file.endswith('.txt'):
fileCounter += 1
Solution 2.
fileCounter = len(glob.glob1("data/accounts/","*.txt"))
Read about glob here here

An alternative module that may work for you is glob.
It will allow you to use wildcards so you can capture just the files you are interested in.
from glob import glob
filenames = glob("data/accounts/*.txt")
number_of_files = len(filenames)
print(number_of_files)

Assuming your text files end in ".txt", you could use something like:
files = [x for x in os.listdir("data/accounts/") if (os.isfile(x) and x.endswith('.txt'))]
number_files = len(list)
print(number_files)
Using os.isfile() to ignore directories and string.endswith() to determine text file-ness.

number_of_files = sum(f.endswith('.txt') for f in os.listdir("data/accounts/"))
str.endswith() returns True or False
True and False have numeric values of one and zero, respectively.
sum() will consume the generator expression.

Related

How can I use Python to walk through files in directories and output a pandas data frame given certain constraints?

So I'm using Pyhton, and I have a parent directory, with two child directories, in turn containing many directories, each with three files. I want to take the third file (which is a .CSV file) of each of these directories, and parse them together into a pandas dataframe. This is the code I have this far
import os
rootdir ='C:\\Dir\\Dir\\Dir\\root(parent)dir'
# os.listdir(rootdir)
# os.getcwd()
filelist = os.listdir(rootdir)
# file_count = len(filelist)
def list_files(dir):
r = []
for root, dirs, files in os.walk(dir):
# if files.startswith('C74'):
for name in files:
r.append(os.path.join(root, name))
return r
filelist = list_files(rootdir)
Now with "filelist" I get all file paths contained in all directories as strings. Now I need to find:
1. The file names that begin with three specific letters (for example funtest, in this case the first letters being fun)
2. Take every third file, and construct a pandas dataframe from that, so that I can proceed to perform data analysis.
IIUC we can do this much easier using a recursive function from pathlib :
from pathlib import Path
csv = [f for f in Path(r'parent_dir').rglob('*C74*.csv')]
df = pd.concat([pd.read_csv(f) for f in csv])
if you want to subset your list again you could do
subset_list = [x for x in csv if 'abc' in x.stem]
Test
[x for x in csv if 'abc' in x.stem]
out : ['C74_abc.csv', 'abc_C74.csv']

Is there a way to read n text files in a folder and store as n str variable?

I want to read N number of text files in a folder and store them as N number of variables. Note, input will just be folder path and number of text files in it may vary(so n).
Manually i do it like below code, which needs to be completely changed:
import os
os.chdir('C:/Users/Documents/0_CDS/fileread') # Work DIrectory
#reading file
File_object1 = open(r"abc","r")
ex1=File_object1.read()
File_object2 = open(r"def.txt","r")
ex2=File_object2.read()
File_object3 = open(r"ghi.txt","r")
ex3=File_object3.read()
File_object4 = open(r"jkl.txt","r")
ex4=File_object4.read()
File_object5 = open(r"mno.txt","r")
ex5=File_object5.read()
You can use python's built-in dict. Here I only give keys of each input as its filename, you can name them in anyway you like.
import os
path = 'Your Directory'
result_dict = {}
for root, dirs, files in os.walk(path):
for f in files:
with open(os.path.join(path,f), 'r') as myfile:
result_dict[f] = myfile.read()
If you are not interested in the file names and only the content and there are only files in the dir
from os import listdir
l = [open(f).read() for f in listdir('.')]

Python 3.6 - How to pass file names into unique variables

I'd like to assign unique variable name with each file from a directory. I have no idea how this can be done. I'm new to python, so I'm sorry is the code is scruffy.
def DataFinder(path, extension):
import os
count = 0
extensions = ['.txt','.csv','.xls','xlsm','xlsx']
allfiles = []
if not extension in extensions:
print('Can\'t read data from this file type.\n','Allowed file types are\n',str(extensions))
else:
#loop through the files
for root, dirs, files in os.walk(path):
for file in files:
#check if the file ends with the extension
if file.endswith(extension):
count+=1
print(str(count)+': '+file)
allfiles.append(file)
if count==0:
print('There are no files with',extension,'extension in this folder.')
return allfiles
How this code be modified to assign variable name like df_number.of.file with each iteration as a string?
Thanks
My ultimate goal is to have set of DataFrame objects for each file under unique variable name without a need to create those variables manually.
The suggested duplicate did not answer my question, neither worked for me.
allfiles = {}
#filter through required data extensions
if not extension in extensions:
print('Can\'t read data from this file type.\n','Allowed file types are\n',str(extensions))
else:
#loop through the files
for root, dirs, files in os.walk(path):
for file in files:
#check if the file ends with the extension
if file.endswith(extension):
#raise counter
count+=1
print(str(count)+': '+file)
allfiles.update({'df'+str(count) : path+file})
After adjusting the code as suggested my output was a dictionary:
{'df1': 'C:/Users/Bartek/Downloads/First.csv', 'df2': 'C:/Users/Bartek/Downloads/Second.csv', 'df3': 'C:/Users/Bartek/Downloads/Third.csv'}
I achieved similar thing previously using list:
['df_1First.csv', 'df_2Second.csv', 'df_3Third.csv']
But my exact question is how to achieve this:
for each object in dict:
-create a variable with consecutive object number
so this variable(s) can be passed as data argument to pandas.DataFrame()
I know this is very bad idea (http://stupidpythonideas.blogspot.co.uk/2013/05/why-you-dont-want-to-dynamically-create.html), therefore can you please show me proper way using dict?
Many thanks
You should be able to modify this section of the code to accomplish what you desire. Instead of printing out the number of files. use count to create new unique filenames.
if file.endswith(extension):
count+=1
newfile = ('df_' + str(count) + file)
allfiles.append(newfile)
count would be unique for each different file extension. You should be able to find the newly created file names in allfiles.
EDIT to use a dictionary (thanks Rory): I would suggest an alternative route. create a dictionary and use the file name as the key.
allfilesdict = {}
...
if file.endswith(extension):
count+=1
newfile = ('df_' + str(count) + file)
allfilesdict[file] = newfile
then remember to return the allfilesdict if you are going to use it somewhere outside of your function.
you can modify first script like these.
from time import gmtime, strftime
import os
def DataFinder(path, extension):
count = 0
extensions = ['.txt','.csv','.xls','xlsm','xlsx']
allfiles = []
if not extension in extensions:
print('Can\'t read data from this file type.\n','Allowed file types are\n',str(extensions))
else:
#loop through the files
for root, dirs, files in os.walk(path):
for file in files:
#check if the file ends with the extension
if file.endswith(extension):
count+=1
#taking date and time
date_time=strftime("%Y-%m-%d %H:%M:%S", gmtime())
#now to get file name we are splite with (.)dot so in list we get first (i.e.file_name[0]) file name and (i.e.file_name[1]) as extension.
file_name=file.split('.')
allfiles.append(file_name[0]+date_time+'.'+file_name[1])
if count==0:
print('There are no files with',extension,'extension in this folder.')
return allfiles
print DataFinder('/home/user/tmp/test','.csv')

Rename files in a directory with incremental index

INPUT: I want to add increasing numbers to file names in a directory sorted by date. For example, add "01_", "02_", "03_"...to these files below.
test1.txt (oldest text file)
test2.txt
test3.txt
test4.txt (newest text file)
Here's the code so far. I can get the file names, but each character in the file name seems to be it's own item in a list.
import os
for file in os.listdir("/Users/Admin/Documents/Test"):
if file.endswith(".txt"):
print(file)
The EXPECTED results are:
01_test1.txt
02_test2.txt
03_test3.txt
04_test4.txt
with test1 being the oldest and test 4 being the newest.
How do I add a 01_, 02_, 03_, 04_ to each file name?
I've tried something like this. But it adds a '01_' to every single character in the file name.
new_test_names = ['01_'.format(i) for i in file]
print (new_test_names)
If you want to number your files by age, you'll need to sort them first. You call sorted and pass a key parameter. The function os.path.getmtime will sort in ascending order of age (oldest to latest).
Use glob.glob to get all text files in a given directory. It is not recursive as of now, but a recursive extension is a minimal addition if you are using python3.
Use str.zfill to strings of the form 0x_
Use os.rename to rename your files
import glob
import os
sorted_files = sorted(
glob.glob('path/to/your/directory/*.txt'), key=os.path.getmtime)
for i, f in enumerate(sorted_files, 1):
try:
head, tail = os.path.split(f)
os.rename(f, os.path.join(head, str(i).zfill(2) + '_' + tail))
except OSError:
print('Invalid operation')
It always helps to make a check using try-except, to catch any errors that shouldn't be occurring.
This should work:
import glob
new_test_names = ["{:02d}_{}".format(i, filename) for i, filename in enumerate(glob.glob("/Users/Admin/Documents/Test/*.txt"), start=1)]
Or without list comprehension:
for i, filename in enumerate(glob.glob("/Users/Admin/Documents/Test/*.txt"), start=1):
print("{:02d}_{}".format(i, filename))
Three things to learn about here:
glob, which makes this sort of file matching easier.
enumerate, which lets you write a loop with an index variable.
format, specifically the 02d modifier, which prints two-digit numbers (zero-padded).
two methods to format integer with leading zero.
1.use .format
import os
i = 1
for file in os.listdir("/Users/Admin/Documents/Test"):
if file.endswith(".txt"):
print('{0:02d}'.format(i) + '_' + file)
i+=1
2.use .zfill
import os
i = 1
for file in os.listdir("/Users/Admin/Documents/Test"):
if file.endswith(".txt"):
print(str(i).zfill(2) + '_' + file)
i+=1
The easiest way is to simply have a variable, such as i, which will hold the number and prepend it to the string using some kind of formatting that guarantees it will have at least 2 digits:
import os
i = 1
for file in os.listdir("/Users/Admin/Documents/Test"):
if file.endswith(".txt"):
print('%02d_%s' % (i, file)) # %02d means your number will have at least 2 digits
i += 1
You can also take a look at enumerate and glob to make your code even shorter (but make sure you understand the fundamentals before using it).
test_dir = '/Users/Admin/Documents/Test'
txt_files = [file
for file in os.listdir(test_dir)
if file.endswith('.txt')]
numbered_files = ['%02d_%s' % (i + 1, file)
for i, file in enumerate(txt_files)]

Searching multiple text files for two strings?

I have a folder with many text files (EPA10.txt, EPA55.txt, EPA120.txt..., EPA150.txt). I have 2 strings that are to be searched in each file and the result of the search is written in a text file result.txt. So far I have it working for a single file. Here is the working code:
if 'LZY_201_335_R10A01' and 'LZY_201_186_R5U01' in open('C:\\Temp\\lamip\\EPA150.txt').read():
with open("C:\\Temp\\lamip\\result.txt", "w") as f:
f.write('Current MW in node is EPA150')
else:
with open("C:\\Temp\\lamip\\result.txt", "w") as f:
f.write('NOT EPA150')
Now I want this to be repeated for all the text files in the folder. Please help.
Given that you have some amount of files named from EPA1.txt to EPA150.txt, but you don't know all the names, you can put them all together inside a folder, then read all the files in that folder using the os.listdir() method to get a list of filenames. You can read the file names using listdir("C:/Temp/lamip").
Also, your if statement is wrong, you should do this instead:
text = file.read()
if "string1" in text and "string2" in text
Here's the code:
from os import listdir
with open("C:/Temp/lamip/result.txt", "w") as f:
for filename in listdir("C:/Temp/lamip"):
with open('C:/Temp/lamip/' + filename) as currentFile:
text = currentFile.read()
if ('LZY_201_335_R10A01' in text) and ('LZY_201_186_R5U01' in text):
f.write('Current MW in node is ' + filename[:-4] + '\n')
else:
f.write('NOT ' + filename[:-4] + '\n')
PS: You can use / instead of \\ in your paths, Python automatically converts them for you.
Modularise! Modularise!
Well, not in the terms of having to write distinct Python modules, but isolate the different tasks at hand.
Find the files you wish to search.
Read the file and locate the text.
Write the result into a separate file.
Each of these tasks can be solved independently. I.e. to list the files, you have os.listdir which you might want to filter.
For step 2, it does not matter whether you have 1 or 1,000 files to search. The routine is the same. You merely have to iterate over each file found in step 1. This indicates that step 2 could be implemented as a function that takes the filename (and possible search-string) as argument, and returns True or False.
Step 3 is the combination of each element from step 1 and the result of step 2.
The result:
files = [fn for fn in os.listdir('C:/Temp/lamip') if fn.endswith('.txt')]
# perhaps filter `files`
def does_fn_contain_string(filename):
with open('C:/Temp/lamip/' + filename) as blargh:
content = blargh.read()
return 'string1' in content and/or 'string2' in content
with open('results.txt', 'w') as output:
for fn in files:
if does_fn_contain_string(fn):
output.write('Current MW in node is {1}\n'.format(fn[:-4]))
else:
output.write('NOT {1}\n'.format(fn[:-4]))
You can do this by creating a for loop that runs through all your .txt files in the current working directory.
import os
with open("result.txt", "w") as resultfile:
for result in [txt for txt in os.listdir(os.getcwd()) if txt.endswith(".txt")]:
if 'LZY_201_335_R10A01' and 'LZY_201_186_R5U01' in open(result).read():
resultfile.write('Current MW in node is {1}'.format(result[:-4]))
else:
resultfile.write('NOT {0}'.format(result[:-4]))

Categories