Find directories missing .csv file in Python

Find directories missing .csv file in Python - python

I have ~1000 directories, containing various .csv files within them. I am trying to check if a specific type of csv file, containing a filename that begins with PTSD_OCOTBER, exists in each directory.
If this file does not exist in the directory, I want to print out that directory into a .txt file.
Here is what I have so far.
import os,sys,time,shutil
import subprocess
#determine filetype to look for.
file_type = ".csv"
print("Running file counter for" + repr(file_type))
#for each folder in the root directory
for subdir, dirs, files in os.walk(rootdir):
if("GeneSet" in subdir):
folder_name = subdir.rsplit('/', 1)[-1] #get the folder name.
for f in files:
#unclear how to write this part.
#how to tell if no files exist in directory?
This successfully finds the .csv files of interest, but how do achieve the above?

So files is the list of files in that directory that you are currently walking. You want to know if there are no files that start with PTSD_OCOTBER (PTSD_OCTOBER ?):
for subdir, dirs, files in os.walk(rootdir):
if("GeneSet" in subdir):
folder_name = subdir.rsplit('/', 1)[-1] #get the folder name.
dir_of_interest = not any(f.startswith('PTSD_OCOTBER') for f in files)
if dir_of_interest:
# do stuff with folder_name
Now you want to save the results into a text file? If you have a Unix-style computer, then you can use output redirection on your terminal, such as
python3 fileanalysis.py > result.txt
after writing print(folder_name) instead of # do stuff with folder_name.
Or you can use Python itself to write the file, such as:
found_dirs = []
for subdir, dirs, files in os.walk(rootdir):
...
if dir_of_interest:
found_dirs.append(folder_name)
with open('result.txt', 'w') as f:
f.write('\n'.join(found_dirs))

Related

how to combine all the files of one directory of one extension into one folder

how can i combine all PDF files of one directory (this pdfs can be on different deep of directory) into one new folder?
i have been tried this:
new_root = r'C:\Users\me\new_root'
root_with_files = r'C:\Users\me\all_of_my_pdf_files\'
for root, dirs, files in os.walk(root_with_files):
for file in files:
os.path.join(new_root, file)
but it's doest add anything to my folder

You may try this:
import shutil
new_root = r'C:\Users\me\new_root'
root_with_files = r'C:\Users\me\all_of_my_pdf_files'
for root, dirs, files in os.walk(root_with_files):
for file in files:
if file.lower().endswith('.pdf') : # .pdf files only
shutil.copy( os.path.join(root, file), new_root )

Your code doesn't move any files to new folder. you can move your files using os.replace(src,dst).
try this:
new_root = r'C:\Users\me\new_root'
root_with_files = r'C:\Users\me\all_of_my_pdf_files\'
for root, dirs, files in os.walk(root_with_files):
for file in files:
os.replace(os.path.join(root, file),os.path.join(new_root, file))

Trying to reach all .txt files in Python

I have a folder which is "labels". In this folder, thera are 50 folders again and each of these 50 folder have .txt files. How can I reach these .txt files with using Python 2?

Here's code that will go through all folders in labels and print content of txt files located inside them.
import os
for folder in os.listdir('labels'):
for txt_file in os.listdir('labels/{}'.format(folder)):
if txt_file.endswith('.txt'):
file = open('labels/{}/{}'.format(folder, txt_file), 'r')
content = file.read()
file.close()
print(content)

If you just want to list the files in the folders:
import os
rootdir = 'C:/Users/youruser/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print (os.path.join(subdir, file))

Copying files in python using shutil

I have the following directory structure:
-mailDir
-folderA
-sub1
-sub2
-inbox
-1.txt
-2.txt
-89.txt
-subInbox
-subInbox2
-folderB
-sub1
-sub2
-inbox
-1.txt
-2.txt
-200.txt
-577.txt
The aim is to copy all the txt files under inbox folder into another folder.
For this I tried the below code
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
shutil.copy(path.join(ii,i),destDir)
If the inbox directory only has .txt files then the above code works fine. Since the inbox folder under folderA directory has other sub directory along with .txt files, the code returns permission denied error. What I understood is shutil.copy won't allow to copy the folders.
The aim is to copy only the txt files in every inbox folder to some other location. If the file names are same in different inbox folder I have to keep both file names. How we can improve the code in this case ? Please note other than .txt all others are folders only.

One simple solution is to filter for any i that does not have the .txt extension by using the string endswith() method.
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
if i.endswith('.txt'):
shutil.copy(path.join(ii,i),destDir)
This should ignore any folders and non-txt files that are found with os.listdir(ii). I believe that is what you are looking for.

Just remembered that I once wrote several files to solve this exact problem before. You can find the source code here on my Github.
In short, there are two functions of interest here:
list_files(loc, return_dirs=False, return_files=True, recursive=False, valid_exts=None)
copy_files(loc, dest, rename=False)
For your case, you could copy and paste these functions into your project and modify copy_files like this:
def copy_files(loc, dest, rename=False):
# get files with full path
files = list_files(loc, return_dirs=False, return_files=True, recursive=True, valid_exts=('.txt',))
# copy files in list to dest
for i, this_file in enumerate(files):
# change name if renaming
if rename:
# replace slashes with hyphens to preserve unique name
out_file = sub(r'^./', '', this_file)
out_file = sub(r'\\|/', '-', out_file)
out_file = join(dest, out_file)
copy(this_file, out_file)
files[i] = out_file
else:
copy(this_file, dest)
return files
Then just call it like so:
copy_files('mailDir', 'destFolder', rename=True)
The renaming scheme might not be exactly what you want, but it will at least not override your files. I believe this should solve all your problems.

Here you go:
import os
from os import path
import shutil
destDir = '<absolute-path>'
for root, dirs, files in os.walk(os.getcwd()):
# Filter out only '.txt' files.
files = [f for f in files if f.endswith('.txt')]
# Filter out only 'inbox' directory.
dirs[:] = [d for d in dirs if d == 'inbox']
for f in files:
p = path.join(root, f)
# print p
shutil.copy(p, destDir)
Quick and simple.
sorry, I forgot the part where, you also need unique file names as well. The above solution only works for distinct file names in a single inbox folder.
For copying files from multiple inboxes and having a unique name in the destination folder, you can try this:
import os
from os import path
import shutil
sourceDir = os.getcwd()
fixedLength = len(sourceDir)
destDir = '<absolute-path>'
filteredFiles = []
for root, dirs, files in os.walk(sourceDir):
# Filter out only '.txt' files in all the inbox directories.
if root.endswith('inbox'):
# here I am joining the file name to the full path while filtering txt files
files = [path.join(root, f) for f in files if f.endswith('.txt')]
# add the filtered files to the main list
filteredFiles.extend(files)
# making a tuple of file path and file name
filteredFiles = [(f, f[fixedLength+1:].replace('/', '-')) for f in filteredFiles]
for (f, n) in filteredFiles:
print 'copying file...', f
# copying from the path to the dest directory with specific name
shutil.copy(f, path.join(destDir, n))
print 'copied', str(len(filteredFiles)), 'files to', destDir
If you need to copy all files instead of just txt files, then just change the condition f.endswith('.txt') to os.path.isfile(f) while filtering out the files.

reading HTML(different folders) files

I want to read HTML files in python. Normaly I do it like this (and it works):
import codecs
f = codecs.open("test.html",'r')
print f.read()
The Problem is that my html files are not all in the same Folder since have a program which generates this html files and save them into folders which are inside the folder where I have my script to read the files.
Summarizing, I have my script in a Folder and inside this Folder there are more Folders where the generated html files are.
Does anybody know how can I proceed?

import os
import codecs
for root, dirs, files in os.walk("./"):
for name in files:
abs_path = os.path.normpath(root + '/' + name)
file_name, file_ext = os.path.splitext(abs_path)
if file_ext == '.html':
f = codecs.open(abs_path,'r')
print f.read()
This will walk through <script dir>/ (./ will get translated to your script-directory) and loop through all files in each sub-directory.
It will check if the extension is .html and do the work on each .html file.
You would perhaps define more file endings that are "accepted" (for instance .htm).

use os.walk:
import os,codecs
for root, dirs, files in os.walk("/mydir"):
for file in files:
if file.endswith(".html"):
f = codecs.open(os.path.join(root, file),'r')
print f.read()

Os.walk upon reaching a new folder

I wrote this script to make M3u files for my music collection so i can open just one file and listen to a whole cd or w.e.
What my script does ATM is: make an M3u file for every song within the CWD and the underlaying folders in one M3u file which he places in the CWD.
But i want to also make an M3u file in every sub folder of the CWD.
So upon reaching a subfolder it should open a file with the filename of the CWD and place all the names of that folder into that file and save the file as: "CWD".M3u
import os,sys
folder_name=os.path.basename(os.getcwd())
folder=os.getcwd()
ext3=['.mp3','.Mp3']
file=open('%s.m3u'%(folder_name),'w')
for root, dirs, files in os.walk(folder):
for x in files:
if x[-4:] in ext3:
print(root+'\\'+x)
file.write('%s\%s\n'%(root,x))
file.close()
if not x[-4:] in ext3:
print("List is empty.")

I think this is what you're looking for. os.walk is actually recursive, so your code could be made to work just by opening a new .m3u file in the directly currently being walked over on every iteration of the outer for loop:
import os
exts = ('.mp3','.Mp3')
for root, dirs, files in os.walk(os.getcwd()):
m3uname = os.path.basename(root)
with open("{}.m3u".format(os.path.join(root, m3uname)), 'w') as outfile:
for f in files:
if f.endswith(exts):
outfile.write('{}\n'.format(os.path.join(root, f)))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find directories missing .csv file in Python - python

Related

how to combine all the files of one directory of one extension into one folder

Trying to reach all .txt files in Python

Copying files in python using shutil

reading HTML(different folders) files

Os.walk upon reaching a new folder

Categories

Resources