I am trying to create a simple search engine to look inside a file. In order to reuse the code I separated the search function, but for some reason it just doesn't work the second time round.
The first time round it shows the result as it should but the second time I type a name it doesn't give me any result at all. Its like the c variable is not going in to the searchpart(c, path) function the second time round.
import os
def searchpart(c, path):
employees = os.walk(path)
for root, dirs, files in employees:
names = os.path.basename(root)
if c.lower() in names.lower():
print(root)
os.chdir(root)
for i in os.listdir():
print("-----> {}".format(i))
def welcomepart(path):
# this function allows to reuse the application after a name is search.
c = input("\n-------> please introduce the name? \n")
searchpart(c, path)
def mainfuntion():
path = 'WORKERS'
invalid_input = True
print('______________ Welcome ______________ \n ')
while invalid_input:
welcomepart(path)
mainfuntion()
This work-around seems to fix the problem:
def searchpart(c, path):
cwd = os.getcwd()
employees = os.walk(path)
for root, dirs, files in employees:
names = os.path.basename(root)
if c.lower() in names.lower():
print(root)
os.chdir(root)
for i in os.listdir():
print("-----> {}".format(i))
os.chdir(cwd)
It just remembers which directory you were in before the function call and changes back before returning.
However, I'm sure there will be a solution where the line: os.chdir(root) is not needed.
Related
I'm new to python, and i'm trying to create a class (with a method) that takes file names from a directory, appends them into a list, and then prints it out.
class data:
def __init__(self):
self.images = []
def showImg(self):
path = r"C:\path"
dirs = os.listdir(path)
for file in dirs:
self.images.append(file)
return self.images
data1 = data()
print (data1.images)
When I try to run the code all I get is "[ ]" as output.
You forgot to call the function showImg? You have three options:
You can add it in your init
def __init__(self):
self.images = []
self.showImg()
or call it later and then you get it with the variable:
data1 = data()
data1.showImg()
print (data1.images)
or call it directly and get the return list from the function:
data1 = data()
print (data1.showImg())
My goal is to store all files and directories in a structured data tree, where each:
directory is a node
file is a leaf
My code below works fine. However, I only take one step at a time and interrupt/restart the walking process for every directory. (see step_in() method)
Apparently it is possible and considered "advanced" to break into the process of an iteration itself and work with it. Therefore my question is, is it possible to "break into" the os.walk process and yield what's necessary?
import os
import sys
import inspect
DEBUG = True
def report(*args,**kwargs):
global DEBUG
if DEBUG: print(*args,**kwargs)
class directory:
def __init__(self, path):
self.path = path
#property
def name(self):
return os.path.basename(self.path)
def __repr__(self):
ID = hex(id(self))
return "<directory \"{:}\" at {}>".format(self.name,ID)
def step_in(self):
"""Step into the dir and find all files/dirs.
Step into the directory path and search for:
- directories --> add string name to children (SEMI CHILD)
- and files --> add string name to leafs
"""
for p,d,f in os.walk(self.path):
self.children = d
report("--->kids found : {}".format(d))
self.leafs = f
report("--->leafs found: {}".format(f))
return p
class walker:
def __init__(self, root_path):
self.root = directory(root_path)
def walk(self, target=None):
"""Walk through all dirs and create tree.
Recursive process with root directory as initial directory.
"""
if not(target):
target = self.root
path = target.step_in()
for i in range(len(target.children)):
#get the next path
next_path = os.path.join(path,target.children[i])
report("\nnext is: {}".format(next_path))
#save dir by replacing the string child with an actual child
target.children[i] = directory(next_path)
#walk into that child
self.walk(target.children[i])
if __name__ == "__main__":
w = walker('/Users/xxx/test/xxx')
w.walk()
So I've got a couple global variables: directory_name and file_list
They're defined at the top and then I give them values in main. I need their values in a function called checkDirectory(blocknum). If I print their values before I call the function, they're correct, but in the function they are empty. This is some of the code:
file_list = []
directory_name = ""
def checkDirectory(blocknum):
global directory_name
global file_list
directory = tokenize(open(directory_name + '/' + file_list[blocknum], 'r').read())
main():
try:
directory_name = sys.argv[1]
if not os.path.exists(directory_name):
print("This is not a working directory.")
return
except:
directory_name = os.getcwd()
files = os.listdir(directory_name)
file_list = sorted(files, key=lambda x: int((x.split("."))[1].strip()))
....
checkDirectory(26)
This is a basic 100 line script, and I can pass in the variables but I'll have to do that for three or four functions which will be recursive, so I'd rather not have to do it every time.
You are shadowing directory_name and file_list in your main function. Since those variables are not known in that scope, they get created locally. In order to operate on the global variables, you need to declare them global in your main() as well:
file_list = []
directory_name = ""
def checkDirectory(blocknum):
global directory_name
global file_list
directory = tokenize(open(directory_name + '/' + file_list[blocknum], 'r').read())
main():
global directory_name
global file_list
...
Please remember that, as mentioned in the comments, using globals is not good practice and can lead to bad code in the long run (in terms of unreadable/unmaintainable/buggy).
I am trying to create a walker that goes through directories. Here are the inputs and outputs which I have partly working. I am using a test directory but I would like this to be done on any directory which is leading to some problems.
[IN]: print testdir #name of the directory
[OUT]: ['j','k','l'] #directories under testdir
[IN]: print testdir.j
[OUT]: ['m','n'] # Files under testdir.j
Here is the code so far:
class directory_lister:
"""Lists directories under root"""
def __init__(self,path):
self.path = path
self.ex = []
for item in os.listdir(path):
self.ex.append(item)
def __repr__(self):
return repr(self.ex)
This returns the directories and files but I have to manually assign the names of the directories.
testdir = directory_lister(path/to/testdir)
j = directory_lister(path/to/j)
etc
Is there a way to automate instances such that:
for root,dirs,files in os.walk(/path/to/testdir/):
for x in dirs:
x = directory_lister(root) #I want j = directory_lister(path/to/j), k = directory_lister(path/to/k) and l = directory_lister(path/to/l) here.
Can there be a:
class directory_lister:
def __init__(self,path):
self.path = path
self.j = directory_lister(path + os.sep + j) # how to automate this attribute of the class when assigned to an instance??
The code above is wrong as the object x only becomes an instance but j,k,l have to be defined manually. Do I have to use another class or a dictionary with getattr but I always run into the same problem. If any extra information is required please ask, I hope I made this clear.
UPDATE 2
Is there a way to add other complex functions to the DirLister by Anurag below? So when it gets to a file say testdir/j/p, it prints out the first line of file p.
[IN] print testdir.j.p
[OUT] First Line of p
I have made a class for printing out the first line of the file:
class File:
def __init__(self, path):
"""Read the first line in desired path"""
self.path = path
f = open(path, 'r')
self.first_line = f.readline()
f.close()
def __repr__(self):
"""Display the first line"""
return self.first_line
Just need to know how to incorporate it in the class below. Thank you.
I assume you want sub-dir to be accessible like a attribute, you can achieve that two ways
Go thru list of files and create variables dynamically
Hook into attribute access and correctly return listers as needed
I prefer second approach as it is lazy, better and easier to implement
import os
class DirLister(object):
def __init__(self, root):
self.root = root
self._list = None
def __getattr__(self, name):
try:
var = super(DirLister).__getattr__(self, name)
return var
except AttributeError:
return DirLister(os.path.join(self.root, name))
def __str__(self):
self._load()
return str(self._list)
def _load(self):
"""
load once when needed
"""
if self._list is not None:
return
self._list = os.listdir(self.root) # list root someway
root = DirLister("/")
print root.etc.apache2
output:
['mods-enabled', 'sites-80', 'mods-available', 'ports.conf', 'envvars', 'httpd.conf', 'sites-available', 'conf.d', 'magic', 'apache2.conf', 'sites-enabled']
You can improve this to have better error checking etc
Code explanation: this is basically a recursive listing of directory, so a DirLister objects lists files under the given root and if some variable is accessed with dotted notation it returns a DirLister assuming that that attribute is a folder under the root. So if we try to create DirLister class step by step it will be more clear
1- A simple DirLister which just lists files/folders under it
class DirLister(object):
def __init__(self, root):
self.root = root
self._list = os.listdir(self.root)
2- Our simple lister just list files one level deep, if we want to get filers under subfolders we can hook into __getattr__ which is called with varname when obj.varname is used. So if our dir-lister doesn't have a attribute named varname we assume user is trying to access that directory under given root, so we create another DirLister whose root is root+subdirname
def __getattr__(self, name):
try:
var = super(DirLister).__getattr__(self, name)
return var
except AttributeError:
return DirLister(os.path.join(self.root, name))
Note: first we check base class for that attribute because we don't want to treat all variable access as sub-dir access, if there is no such attribute hence AttributeError then we create a new DirLister for sub-folder.
3- To improve code so that we don't list all folders even if user did not ask for them, we only list when user requires, hence a load method
def _load(self):
if self._list is not None:
return
self._list = os.listdir(self.root) # list root someway
so this method lists dir if not already listed, and this should be called when we finally need it e.g. while printing the list
Edit: as asked by OP here is the alternate method of recursively list whole tree though I would strongly recommend against it
import os
class RecursiveDirLister(object):
def __init__(self, root):
self._sublist = []
for folder in os.listdir(root):
self._sublist.append(folder)
path = os.path.join(root, folder)
if not os.path.isdir(path):
continue
# add it as attribute, assuming that dir-name is valid python varname
try:
sublister = RecursiveDirLister(path)
except OSError:
continue#ignore permission errors etc
setattr(self, folder, sublister)
def __str__(self):
return str(self._sublist)
etc = RecursiveDirLister("/etc")
print etc.fonts
output:
['conf.avail', 'conf.d', 'fonts.conf', 'fonts.dtd']
Not sure what you're asking, but would this work?
for root,dirs,files in os.walk(/path/to/testdir/):
listers = dict((dir, directory_lister(dir)) for dir in dirs)
#now you can use:
listers['j']
listers['k']
listers['l']
I wrote the below code in order to check for three files and whichever files exist, run a "scan" on the file (if a file does not exist, don't worry about it just run a "scan" on the available files) and produce the proper output file on those available files.
The program I'm working on includes the following code:
def InputScanAnswer():
scan_number = raw_input("Enter Scan Type number: ")
return scan_number
This function checks if these three files exist and if so, assign specific values to hashcolumn and to filepathNum
def chkifexists():
list = ['file1.csv', 'file2.csv', 'file3.csv']
for filename in list:
if os.path.isfile(filename):
if filename == "file1.csv":
hashcolumn = 7
filepathNum = 5
if filename == "file2.csv":
hashcolumn = 15
filepathNum = 5
if filename == "file3.csv":
hashcolumn = 1
filepathNum = 0
#print filename, hashcolumn, filepathNum
def ScanChoice(scan_number):
if scan_number == "1":
chkifexists()
onlinescan(filename, filename + "_Online_Scan_Results.csv", hashcolumn, filepathNum) #this is what is giving me errors...
elif scan_number == "2":
print "this is scan #2"
elif scan_number =="3":
print "this is scan #3"
else:
print "Oops! Invalid selection. Please try again."
def onlinescan(FileToScan, ResultsFile, hashcolumn, filepathNum):
# web scraping stuff is done in this function
The error that I run into is global name 'filename' is not defined.
I realize that the problem is I'm attempting to send local variables from chkifexists() to the onlinescan() parameters. I tried using
return filename
return hashcolumn
return filepathNum
at the end of the chkifexists() function but that was not working either. Is there anyway to do what I'm trying to do in the
onlinescan(filename, filename + "_Online_Scan_Results.csv", hashcolumn, filepathNum)
line without using global variables? I know they are discouraged and I'm hoping I can go about it another way. Also, does having hashcolumn and filepathNum parameters in onlinescan() have anything to do with this?
Inside chkifexists, you would return all three variables like so:
return (filename, hashcolumn, filepathNum)
You would retrieve these by calling the function like so:
(filename, hashcolumn, filepathNum) = chkifexists()
You now have them in your function scope without needing global variables!
Technically, you don't need the parenthesis, either. In fact, I'm not sure why I included them. But it works either way, so what the heck.