I have 50 instances of two files that are in 50 separate folders within a directory. I am trying to read from and extract information from the two files within each folder and append the info from the two files to a list at the same time while in the folder that contains them both. (So they will be associated by being appended to the same same list index) I'm using os.walk and opening the file as soon as the file is recognized. (Or trying to). When I run it is seems like the files in question are never being opened, and definitely nothing is being appended to my lists. Could someone tell me if what I have here is completely ridiculous because it seems logical to me but its not working.
import os
import sys
#import itertools
def get_theList():
#specify directory where jobs are located
#can also set 'os.curdir' to rootDir to read from current
rootDir = '/home/my.user.name/O1/injections/test'
No issues here; this is correct
B_sig = []
B_gl = []
SNR_net = []
a = 0
for root, dirs, files in os.walk(rootDir):
for folder in dirs:
for file in folder:
if file == 'evidence_stacked.dat':
print 'open'
a+=1
ev_file = open(file,"r")
ev_lin = ev_file.split()
B_gl.append(ev_lin[1])
B_sig.append(ev_lin[2])
print ev_lin[1]
ev_file.close()
if file == 'snr.txt':
net_file = open(file,"r")
net_lines=net_file.readlines()
SNR_net.append(net_lines[2])
net_file.close()
print 'len a'
print a
This says 0 on output
print 'B_sig'
print B_sig
print len(B_sig)
print 'B_net'
print B_gl
print len(B_gl)
print 'SNR_net'
print SNR_net
print len(SNR_net)
if __name__ == "__main__":
get_theList()
From help(os.walk):
filenames is a list of the names of the non-directory files in dirpath.
You're checking to see if a list is equal to a string.
files == 'evidence_stacked.dat'
What you really want to do is one of the following:
for file in files:
if file == 'evidence_stacked.dat':
...
Or...
if 'evidence_stacked.dat' in files:
...
Both will work, but the latter is a bit more efficient.
In response to your edit:
Instead of...
for file in folder:
...
use...
for file in os.listdir(os.path.join(rootdir, folder)):
...
Also, where you use file after that, replace it with
os.path.join(rootdir, folder, file)
or store that in a new variable (like, say, file2) and use that in place of file.
Related
I have a folder with a large number of files (mask_folder). The filenames in this folder are built as follows:
asdgaw-1454_mask.tif
lkafmns-8972_mask.tif
sdnfksdfk-1880_mask.tif
etc.
In another folder (test_folder), I have a smaller number of files with filenames written almost the same, but without the addition of _mask. Like:
asdgaw-1454.tif
lkafmns-8972.tif
etc.
What I need is a code to find the files in mask_folder that have an identical start of the filenames as compared to the files in test_folder and then these files should be copied from the mask_folder to the test_folder.
In that way the test_folder contains paired files as follows:
asdgaw-1454_mask.tif
asdgaw-1454.tif
lkafmns-8972_mask.tif
lkafmns-8972.tif
etc.
This is what I tried, it runs without any errors but nothing happens:
import shutil
import os
mask_folder = "//Mask/"
test_folder = "//Test/"
n = 8
list_of_files_mask = []
list_of_files_test = []
for file in os.listdir(mask_folder):
if not file.startswith('.'):
list_of_files_mask.append(file)
start_mask = file[0:n]
print(start_mask)
for file in os.listdir(test_folder):
if not file.startswith('.'):
list_of_files_test.append(file)
start_test = file[0:n]
print(start_test)
for file in start_test:
if start_mask == start_test:
shutil.copy2(file, test_folder)
The past period I searched for but not found a solution for above mentioned problem. So, any help is really appreciated.
First, you want to get only the files, not the folders as well, so you should probably use os.walk() instead of listdir() to make the solution more robust. Read more about it in this question.
Then, I suggest loading the filenames of the test folder into memory (since they are the smaller part) and then NOT load all the other files into memory as well but instead copy them right away.
import os
import shutil
test_dir_path = ''
mask_dir_path = ''
# load file names from test folder into a list
test_file_list = []
for _, _, file_names in os.walk(test_dir_path):
# 'file_names' is a list of strings
test_file_list.extend(file_names)
# exit after this directory, do not check child directories
break
# check mask folder for matches
for _, _, file_names in os.walk(mask_dir_path):
for name_1 in file_names:
# we just remove a part of the filename to get exact matches
name_2 = name_1.replace('_mask', '')
# we check if 'name_2' is in the file name list of the test folder
if name_2 in test_file_list:
print('we copy {} because {} was found'.format(name_1, name_2))
shutil.copy2(
os.path.join(mask_dir_path, name_1),
test_dir_path)
# exit after this directory, do not check child directories
break
Does this solve your problem?
I am trying to code a script that will collect values from a .xvg files. I have 20 folders that contain the targeted file. Folder are numerated from 1-20 (in the code you see 1.Rimo)
I have already made the code that collects the data when I specify full path, however, I need something generic so I can loop through those 20 folders, get that data and store it as a variable.
rmsf = open('/home/alispahic/1.CB1_project/12.ProductionRun/1.Rimo/rmsf.xvg','r+')
for line in rmsf:
if line.startswith(' 4755'):
print (line)
l = line.split()
print (l)
value = float(l[1])
sum1 = float(sum1) + value
print(len(l))
print (sum1)
You can use os.listdir():
base_path = '/home/alispahic/1.CB1_project/12.ProductionRun'
file_name = 'rmsf.xvg'
for dir_name in os.listdir(base_path):
print(dir_name)
with open(os.path.join(base_path, dir_name, file_name)) as f:
for line in f:
# here goes your code
pass
Just remember to join the dir_name with the base_path (the path of the directory you are iterating over).
Also note that this returns files as well, not just directories. If you folder /home/alispahic/1.CB1_project/12.ProductionRun contains only directories, then that won't be a problem; otherwise you would need to filter out the files.
I have solved the problem by adding glob.
for name in glob.glob('/home/alispahic/1.CB1_project/12.ProductionRun/*/rmsf.xvg'):
for line in open(name):
if line.startswith(' 4755'):
I have a list of strings (stored in a .txt file, one string per line) and I want to make a script that takes the first line and search all folders names in a directory, and the takes the second line and search all folders names and so on. How do I do this? Hope i made my self clear. Thks!
This example reads paths from text file and prints them out. Replace the print with your search logic.
import os
textfile = open('C:\\folder\\test.txt', 'r')
for line in textfile:
rootdir = line.strip()
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print(os.path.join(subdir, file))
Assuming that by searching all folders you mean printing them out to the standard output you can do this:
from os import listdir
from os.path import isdir, join
with open('directories.txt', 'r') as f:
i = 1
for line in f.readlines():
directories = []
tmp = line.strip('\n')
for d in listdir(tmp):
if isdir(join(tmp, d)):
directories.append(d)
print('directory {}: {}'.format(i, directories))
i += 1
It will output something like this:
directory 1: ['subfolder_1', 'subfolder_0']
directory 2: ['subfolder_0']
directory 3: []
Note that I recommend using with in order to open files since it will automatically properly close them even if exceptions occur.
I want to go through all folders inside a directory:
directory\
folderA\
a.cpp
folderB\
b.cpp
folderC\
c.cpp
folderD\
d.cpp
The name of the folders are all known.
Specifically, I am trying to count the number of lines of code on each of the a.cpp, b.cpp, c.pp and d.cpp source files. So, go inside folderA and read a.cpp, count lines and then go back to directory, go inside folderB, read b.cpp, count lines etc.
This is what I have up until now,
dir = directory_path
for folder_name in folder_list():
dir = os.path.join(dir, folder_name)
with open(dir) as file:
source= file.read()
c = source.count_lines()
but I am new to Python and have no idea if my approach is appropriate and how to proceed. Any example code shown will be appreciated!
Also, does the with open handles the file opening/closing as it should for all those reads or more handling is required?
I would do it like this:
import glob
import os
path = 'C:/Users/me/Desktop/' # give the path where all the folders are located
list_of_folders = ['test1', 'test2'] # give the program a list with all the folders you need
names = {} # initialize a dict
for each_folder in list_of_folders: # go through each file from a folder
full_path = os.path.join(path, each_folder) # join the path
os.chdir(full_path) # change directory to the desired path
for each_file in glob.glob('*.cpp'): # self-explanatory
with open(each_file) as f: # opens a file - no need to close it
names[each_file] = sum(1 for line in f if line.strip())
print(names)
Output:
{'file1.cpp': 2, 'file3.cpp': 2, 'file2.cpp': 2}
{'file1.cpp': 2, 'file3.cpp': 2, 'file2.cpp': 2}
Regarding the with question, you don't need to close the file or make any other checks. You should be safe as it is now.
You may, however, check if the full_path exists as somebody (you) could mistakenly delete a folder from your PC (a folder from list_of_folders)
You can do this by os.path.isdir which returns True if the file exists:
os.path.isdir(full_path)
PS: I used Python 3.
Use Python 3's os.walk() to traverse all subdirectories and files of a given path, opening each file and do your logic. You can use a 'for' loop to walk it, simplifying your code greatly.
https://docs.python.org/2/library/os.html#os.walk
As manglano said, os.walk()
you can generate a list of folder.
[src for src,_,_ in os.walk(sourcedir)]
you can generate a list of file path.
[src+'/'+file for src,dir,files in os.walk(sourcedir) for file in files]
I know how to do a directory walk (using os.walk) and print out all files in a certain directory. What I want to do further is to insert a blank line after the contents of a directory are printed for all directories at a certain level. To illustrate, suppose I have these files:
/level1/level2a/file1.txt
/level1/level2a/level3a/file2.txt
/level1/level2a/level3b/levle4/file3.txt
/level1/level2b/file4.txt
/level1/level2b/file5.txt
I want to print them as:
/level1/level2a/file1.txt
/level1/level2a/level3a/file2.txt
/level1/level2a/level3b/levle4/file3.txt
/level1/level2b/file4.txt
/level1/level2b/file5.txt
Notice there is a blank line separating the listings of level2a and level2b (but no blank line between level3a and level3b). I want the listings of each directory at level2 (i.e.e 2 levels down from root) to be separated by blanks. How to do this in Python?
P.S. The listing will be quite large so I don't want to do this by hand. Also, the script needs to be flexible so if requirement changes to insert blank lines at level 3 (instead of level 2) it needs to be able to handle that.
You can try something like the following. It checks if the root path has more than the number of directory levels indicated in the argument variable (hard-coded in the example). In that case save it in d variable and the previous different one in prev_d. Then print files in the normal way unless both variables are different, case where I will print a blank line.
Content of script.py:
import os
import sys
arg_dir_level = 4
prev_d = ''
d = ''
for root, dirs, files in os.walk(sys.argv[1]):
if root.count(os.sep) >= arg_dir_level:
d = root.split(os.sep, arg_dir_level+1)[arg_dir_level]
if prev_d and d and d != prev_d:
print()
for file in files:
print(os.path.abspath(root + os.sep + file))
prev_d = d
Running it like:
python3 script.py '.'
Part of the output is:
/home/birei/python/ENV/lib/python3.3/site-packages/zope.event-4.0.2-py3.3.egg/zope/event/__init__.py
/home/birei/python/ENV/lib/python3.3/site-packages/zope.event-4.0.2-py3.3.egg/zope/event/__pycache__/tests.cpython-33.pyc
/home/birei/python/ENV/lib/python3.3/site-packages/zope.event-4.0.2-py3.3.egg/zope/event/__pycache__/__init__.cpython-33.pyc
/home/birei/python/ENV/lib/python3.3/site-packages/__pycache__/pkg_resources.cpython-33.pyc
/home/birei/python/ENV/lib/python3.3/site-packages/__pycache__/easy_install.cpython-33.pyc
/home/birei/python/ENV/lib/python3.3/site-packages/selenium-2.35.0-py3.3.egg/EGG-INFO/dependency_links.txt
/home/birei/python/ENV/lib/python3.3/site-packages/selenium-2.35.0-py3.3.egg/EGG-INFO/PKG-INFO
/home/birei/python/ENV/lib/python3.3/site-packages/selenium-2.35.0-py3.3.egg/EGG-INFO/not-zip-safe
As you can see, when the fourth subdirectory changes beginning from the root where I executed the script (ENV), it prints an additional newline. Perhaps you will need to adjust it but the idea will be similar.
Here's a simple way to do what you want.
The basic idea is that whenever the root is at the level we want to do the separation at, we print out a line return. We can check this by splitting the root path on '/' after removing any '/'s at the start of the path. If there are level pieces, we're at the right place, and should insert a line return.
import os
def do_walk(directory, level=2):
for root, _, files in os.walk(directory):
if len(root.lstrip('/').split('/')) == level:
print
for f in files:
print os.path.join(root, f)
Of course, this does insert an extra line return at the beginning. If you don't want that, I suggest something like:
import os
def do_walk(directory, level=2):
first = True
for root, _, files in os.walk(directory):
if len(root.lstrip('/').split('/')) == level:
if first:
first = False
else:
print
for f in files:
print os.path.join(root, f)