Searching filesystem using Python - python

I have been coding in Python since the last 2 weeks and pretty new to it.
I have written a code to kind of emulate the way "find" command works in *NIX systems. My code works okay-ish for not so deep directories but if I start searching from the "root" directory, it takes too much time and processor heats up :D which on the other hand takes about 8 seconds using "find" cmd.
Hey I know I am kinda noob in Python now but any hint at trying to improve the search efficiency will be greatly appreciated.
Here's what I have written:
#!/usr/bin/python3
import os
class srchx:
file_names = []
is_prohibit = False
def show_result(self):
if(self.is_prohibit):
print("some directories were denied read-access")
print("\nsearch returned {0} result(s)".format(len(self.file_names)))
for _file in self.file_names:
print(_file)
def read_dir(self, cur_dir, srch_name, level):
try:
listing = os.listdir(cur_dir)
except:
self.is_prohibit = True
return
dir_list = []
#print("-"*level+cur_dir)
for entry in listing:
if(os.path.isdir(cur_dir+"/"+entry)):
dir_list.append(entry)
else:
if(srch_name == entry):
self.file_names.append(cur_dir+"/"+entry)
for _dir in dir_list:
new_dir = cur_dir + "/" + _dir
self.read_dir(new_dir, srch_name, level+1)
if(level == 0):
self.show_result()
def __init__(self, dir_name=os.getcwd()):
srch_name = ""
while(len(srch_name) == 0):
srch_name = input("search for: ")
self.read_dir(dir_name, srch_name, 0)
def main():
srch = srchx()
if (__name__ == "__main__"):
main()
Take a look at and please help me to solve this issue.

There is a built-in Directory-Browsing Framework called os.walk() but even os.walk() is slow, if you want to browse faster, you need access to the operating systems file-browser.
https://pypi.python.org/pypi/scandir
scandir is a solution.

What user1767754 said. You can't really improve the speed much using the methods you're calling. os.walk() is a bit more efficient, though. I've never used scandir (or pypi) so I can't comment.
BTW, that's rather good looking code for a noob, Marty! But there are a couple of issues with it.
It's not a good idea to initialise file_names and is_prohibit like that because it makes them class variables; initialise them in __init__.
You should read srch_name outside the class and pass it your class constructor. You do that by making it an arg of __init__, as described in the link above.
It's generally good policy to handle user input in the outermost parts of your code (when practical) rather than doing it in the inner parts of your code. I like to think of my user input routines as border guards that only let good input into the inner sanctum of my code. Users are unpredictable critters and there's no telling what mischief they'll get up to. :)

Related

My code is running without calling variable in Python?

When I run the below code my websites or steam still open. Shouldn't I need to state print(link) or print(steam) for them to open?
import os
import webbrowser
import subprocess
import random
urls = ['https://www.ft.com/',
'https://www.youtube.com/watch?v=xvqcFcfhUVQ',
'https://roadmap.sh/backend',
'https://www.youtube.com/watch?v=YYXdXT2l-Gg&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7']
foxpath = 'C:/Program Files/Mozilla Firefox/Firefox.exe %s'
link = webbrowser.get(foxpath).open(random.choice(urls))
steam = subprocess.call(['C:/Program Files (x86)/Steam/Steam.exe'])
Why does this happen?
I eventually want to run the program from a function call, like below.
def wildcard():
print(random.choice(link, steam))
wildcard()
No, there is nothing special about print. print is just a function that takes in some value and displays it to the user.
If you had instead steam = 3 * 4, would you be surprised to learn that the value 12 is computed, and steam becomes a name for that value, even if you don't do anything with it? It's the same thing here - calling subprocess.call causes the program to launch, and it has nothing to do with the name steam, nor anything that you do or don't do with that name subsequently.
If you were to add the print(steam) line that you have in mind, what it would display (after you close steam and control returns to your program) is the "return code" of that program - this gets into the details of how your operating system works, but most likely it would be something like 0.
If you want something that you can later call in order to launch Steam - well, that's a function. Like you already know how to do:
def steam():
subprocess.call(['C:/Program Files (x86)/Steam/Steam.exe'])
As soon as you issue webbrowser.get or subprocess.call, they execute. Your variables are really storing the return values of those functions, not aliases to those function calls.
If you want to alias the function calls as it appears you are intending, you could do something like this:
def open_link():
return webbrowser.get(foxpath).open(random.choice(urls))
def open_steam():
return subprocess.call(['C:/Program Files (x86)/Steam/Steam.exe'])
Then your top level would be:
def wildcard():
random.choice([link, steam])()
wildcard()
Note the syntax difference for choosing the functions randomly. See this answer for more clarification.
You do invoke something:
steam = subprocess.call(['C:/Program Files (x86)/Steam/Steam.exe'])
The documentation for subprocess.call is clear on what the call method does: it invokes the given argument as a subprocess.
The problem is that your code isn't inside a function, so, when you execute it, it runs all the diretives including
steam = subprocess.call(['C:/Program Files (x86)/Steam/Steam.exe'])
Which calls C:/Program Files (x86)/Steam/Steam.exe, opening your steam app.
Hope it helps.
Thank you, I understand now. I actually just resolved the problem with the following
def wildcard():
for x in range(1):
rand = random.randint(1, 10)
if rand > 5:
link = webbrowser.get(foxpath).open(random.choice(urls))
else:
subprocess.call(['C:/Program Files (x86)/Steam/Steam.exe'])
wildcard()

Is it better to have multiple files for scripts or just one large script file with every function? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I started a project about a year ago involving a simple terminal-based RPG with Python 3. Without really thinking about it I just jumped into it. I started with organizing multiple scripts for each.. well, function. But halfway into the project, for the end goal I'm not sure if it's easier/more efficient to just have one very large script file or multiple files.
Since I'm using the cmd module for the terminal, I'm realizing getting the actual app running to be a looping game might be challenging with all these external files, but at the same time I have a __init__.py file to combine all the functions for the main run script. Here's the file structure.
To clarify I'm not the greatest programmer, and I'm a novice in Python. I'm not sure of the compatibility issues yet with the cmd module.
So my question is this; Should I keep this structure and it should work as intended? Or should I combine all those assets scripts into one file? Or even put them apart of the start.py that uses cmd? Here's the start function, plus some snippets of various scripts.
start.py
from assets import *
from cmd import Cmd
import pickle
from test import TestFunction
import time
import sys
import os.path
import base64
class Grimdawn(Cmd):
def do_start(self, args):
"""Start a new game with a brand new hero."""
#fill
def do_test(self, args):
"""Run a test script. Requires dev password."""
password = str(base64.b64decode("N0tRMjAxIEJSRU5ORU1BTg=="))
if len(args) == 0:
print("Please enter the password for accessing the test script.")
elif args == password:
test_args = input('> Enter test command.\n> ')
try:
TestFunction(test_args.upper())
except IndexError:
print('Enter a command.')
else:
print("Incorrect password.")
def do_quit(self, args):
"""Quits the program."""
print("Quitting.")
raise SystemExit
if __name__ == '__main__':
prompt = Grimdawn()
prompt.prompt = '> '
#ADD VERSION SCRIPT TO PULL VERSION FROM FOR PRINT
prompt.cmdloop('Joshua B - Grimdawn v0.0.3 |')
test.py
from assets import *
def TestFunction(args):
player1 = BaseCharacter()
player2 = BerserkerCharacter('Jon', 'Snow')
player3 = WarriorCharacter('John', 'Smith')
player4 = ArcherCharacter('Alexandra', 'Bobampkins')
shop = BaseShop()
item = BaseItem()
#//fix this to look neater, maybe import switch case function
if args == "BASE_OFFENSE":
print('Base Character: Offensive\n-------------------------\n{}'.format(player1.show_player_stats("offensive")))
return
elif args == "BASE_DEFENSE":
print('Base Character: Defensive\n-------------------------\n{}'.format(player1.show_player_stats("defensive")))
return
* * *
player.py
#import functions used by script
#random is a math function used for creating random integers
import random
#pickle is for saving/loading/writing/reading files
import pickle
#sys is for system-related functions, such as quitting the program
import sys
#create a class called BaseCharacter, aka an Object()
class BaseCharacter:
#define what to do when the object is created, or when you call player = BaseCharacter()
def __init__(self):
#generate all the stats. these are the default stats, not necessarily used by the final class when player starts to play.
#round(random.randint(25,215) * 2.5) creates a random number between 25 and 215, multiplies it by 2.5, then roudns it to the nearest whole number
self.gold = round(random.randint(25, 215) * 2.5)
self.currentHealth = 100
self.maxHealth = 100
self.stamina = 10
self.resil = 2
self.armor = 20
self.strength = 15
self.agility = 10
self.criticalChance = 25
self.spellPower = 15
self.intellect = 5
self.speed = 5
self.first_name = 'New'
self.last_name = 'Player'
self.desc = "Base Description"
self.class_ = None
self.equipment = [None] * 6
#define the function to update stats when the class is set
def updateStats(self, attrs, factors):
#try to do a function
try:
#iterate, or go through data
for attr, fac in zip(attrs, factors):
val = getattr(self, attr)
setattr(self, attr, val * fac)
#except an error with a value given or not existing values
except:
raise("Error updating stats.")
#print out the stats when called
#adding the category line in between the ( ) makes it require a parameter when called
def show_player_stats(self, category):
* * *
note
The purpose of the scripts is to show what kind of structure they have so it helps support the question of whether I should combine or not
First a bit of terminology:
a "script" is python (.py) file that is intended to be directly executed (python myscript.py)
a "module" is a python file (usually containing mostly functions and classes definitions) that is intended to be imported by a script or another module.
a "package" is a directory eventually containing modules and (mandatory in py2, not in py3) an __init__.py file.
You can check the tutorial for more on modules and packages.
Basically, what you want is to organize your code in coherent units (packages / modules / scripts).
For a complete application, you will typically have a "main" module (doesn't have to be named "main.py" - actually it's often named as the application itself) that will only import some definitions (from the stdlib, from 3rd part libs and from your own modules), setup things and run the application's entry point. In your example that would be the "start.py" script.
For the remaining code, what you want is that each module has strong cohesion (functions and classes defined in it are closely related and concur to implement the same feature) and low coupling (each module is as independant as possible of other modules). You can technically put as much functions and classes as you want in a single module, but a too huge module can become a pain to maintain, so if after a first reorganisation based on high cohesion / low coupling you find yourself with a 5000+klocs module you'll probably want to turn it into a package with more specialized submodules.
If you still have a couple utilitie functions that clearly don't fit in any of your modules, the usual solution is to put them together in an "utils.py" (or "misc.py" or "helpers.py" etc) module.
Two things that you absolutely want to avoid are:
circular dependencies, either direct (module A depends on module B, module B depends on module A) or indirect (module A depends on module B which depends on module A). If you find you have such case, it means you should either merge two modules together or extract some definitions to a third module.
wildcard import ("from module import *"), which are a major PITA wrt/ maintainability (you can't tell from the import where some names are imported from) and make the code subject to unexpected - and sometimes not obvious - breakage
As you can see, this is still quite very a generic guideline, but deciding what belongs together can not be automated and in the end depends on your own judgement.
The way you have it currently is fine, personally I much prefer a lot of files as it's a lot easier to maintain. The main issue I see is that all of your code is going under assets, so either you'll end up with everything dumped there (defeating the point of calling it that), or you'll eventually end up with a bit of a mess of folders once you start coding other bits such as the world/levels and so on.
A quite common way of designing projects is your root would be Grimdawn, which contians one file to call your code, then all your actual code goes in Grimdawn/grimdawn. I would personally forget the assets folder and instead put everything at the root of that folder, and only go deeper if some of the files get more complex or could be grouped.
I would suggest something like this (put in a couple of additions as an example):
Grimdawn/characters/Jon_Snow
Grimdawn/characters/New_Player
Grimdawn/start.py
Grimdawn/grimdawn/utils/(files containing generic functions that are not game specific)
Grimdawn/grimdawn/classes.py
Grimdawn/grimdawn/combat.py
Grimdawn/grimdawn/items.py
Grimdawn/grimdawn/mobs/generic.py
Grimdawn/grimdawn/mobs/bosses.py
Grimdawn/grimdawn/player.py
Grimdawn/grimdawn/quests/quest1.py
Grimdawn/grimdawn/quests/quest2.py
Grimdawn/grimdawn/shops.py
The pythonic approach to what goes into a single file (I'll discuss as it applies largely to classes) is that a single file is a module (not a package as I said previously).
A number of tools will typically exist in a single package, but all of the tools in a single module should remain centered around a single theme. With that said, a very small project I will typically keep in a single file with several functions and maybe a few classes inside. I would then use if main to contain the script as I want it run in its entirety.
if __name__== '__main__':
I would break logic down into functions as much as makes sense so that the main body of the script is readable as higher level logic.
Short answer: A file for every function is not manageable on any scale. You should put things together into files (modules) with related functionality. It is up to you as to whether the current functions should be clustered together into modules or not.
There are several ways to approach organizing your code and it ultimately comes down to:
Personal Preference
Team coding standards for your Project
Naming / Structure / Architecture Conventions for your Company
They way I organize my Python code is by creating several directories:
class_files (Reusable code)
input_files (Files read by scripts)
output_files (Files written by scripts)
scripts (Code executed)
This has served me pretty well. Import your paths relatively so the code can be run from any place it is cloned. Here's how I handle the imports in my script files:
import sys
# OS Compatibility for importing Class Files
if(sys.platform.lower().startswith('linux')):
sys.path.insert(0,'../class_files/')
elif(sys.platform.lower().startswith('win')):
sys.path.insert(0,'..\\class_files\\')
from some_class_file import my_reusable_method
This approach also makes it possible to make your code run in various version of Python and your code can detect and import as necessary.
if(sys.version.find('3.4') == 0):
if(sys.platform.lower().startswith('linux') or sys.platform.lower().startswith('mac')):
sys.path.insert(0,'../modules/Python34/')
sys.path.insert(0,'../modules/Python34/certifi/')
sys.path.insert(0,'../modules/Python34/chardet/')
sys.path.insert(0,'../modules/Python34/idna/')
sys.path.insert(0,'../modules/Python34/requests/')
sys.path.insert(0,'../modules/Python34/urllib3/')
elif(sys.platform.lower().startswith('win')):
sys.path.insert(0,'..\\modules\\Python34\\')
sys.path.insert(0,'..\\modules\\Python34\\certifi\\')
sys.path.insert(0,'..\\modules\\Python34\\chardet\\')
sys.path.insert(0,'..\\modules\\Python34\\idna\\')
sys.path.insert(0,'..\\modules\\Python34\\requests\\')
sys.path.insert(0,'..\\modules\\Python34\\urllib3\\')
else:
print('OS ' + sys.platform + ' is not supported')
elif(sys.version.find('2.6') == 0):
if(sys.platform.lower().startswith('linux') or sys.platform.lower().startswith('mac')):
sys.path.insert(0,'../modules/Python26/')
sys.path.insert(0,'../modules/Python26/certifi/')
sys.path.insert(0,'../modules/Python26/chardet/')
sys.path.insert(0,'../modules/Python26/idna/')
sys.path.insert(0,'../modules/Python26/requests/')
sys.path.insert(0,'../modules/Python26/urllib3/')
elif(sys.platform.lower().startswith('win')):
sys.path.insert(0,'..\\modules\\Python26\\')
sys.path.insert(0,'..\\modules\\Python26\\certifi\\')
sys.path.insert(0,'..\\modules\\Python26\\chardet\\')
sys.path.insert(0,'..\\modules\\Python26\\idna\\')
sys.path.insert(0,'..\\modules\\Python26\\requests\\')
sys.path.insert(0,'..\\modules\\Python26\\urllib3\\')
else:
print('OS ' + sys.platform + ' is not supported')
else:
print("Your OS and Python Version combination is not yet supported")

Proper position of imports & definitions in called scripts?

I have one script that calls another - let's call them master.py and fetch.py. I suppose the second script could be integrated into the first, but it does have distinct functionality - so keeping them separate seems like a good way to force myself to learn how to call outside scripts.
Here's the basic structure of fetch.py:
<import block>
infiles = <paths>
arcpy.env.workspace = os.path.dirname(infile)
ws = arcpy.env.workspace
newfile_list = []
def main():
name = <name>
if not arcpy.Exists(name + ".gdb"):
global ws
new_gdb = DM.CreateFileGDB(ws, name + ".gdb")
newfile_list.append(new_gdb)
other_func1()
other_func2()
print "\nNew files from fetch.py:"
for i in newfile_list:
print " " + i
def other_func1():
stuff
def other_func2():
stuff
if __name__ == '__main__':
main()
And master.py:
<import block>
infiles = <paths>
def f1():
stuff
def f2():
stuff
import fetch
fetch.main()
f1()
f2()
The problems, concerning placement of the import block & file definitions of fetch.py:
When I put them inside main() and run it as a standalone script, my arcpy functions don't work because my various imports haven't run yet. Putting them ahead of main() and making them global, solves this problem.
But when I put them outside of main() as you see here, I get an error saying the local variable ws is referenced before assignment. I think this may have to do with my calling fetch.main(), where the initial lines don't get read. (I was able to make it work by declaring global ws, but don't know if this is advisable.)
How do I structure fetch.py so that the import statements and file definitions get read, both when run as a standalone script and when called?
Instead of rewriting your code for you, I will try to point you to some ideas from the developer philosophy toolkit to get you on the way how to refine your approach.
My feeling is that you should contemplate YAGNI and KISS when thinking about your code.
In your comment you write:
I'm keeping the 'fetch' script separate because finding feature orientation seems like a useful stand-alone tool down the road.
Like I say: YAGNI and KISS: If you need it as a standalone tool in the future then split it out in the future. For now keep it simple and put the code that belongs together in one module and make it callable exactly the way you need it. When you work with it and understand the problem better and see what else is needed, you can then add other ways of calling your script. There is no reason why you shouldn't keep all the code in one module while it is still manageable and just add different ways to call it.
If the time comes that you want to organize your code into several modules you can refactor it in a way that you pull out the things you do on the module level at the moment (either into functions or maybe even classes). Then you can control exactly what should happen when and you can control the order of things better.
As further reading in regards to structuring a Python project I would recommend the pypa sample project and Starting A Python Project The Right Way. The problems you have with imports will likely cease to exist if you structure your code and your project in a sensible, pythonic way.

Using getopt() to get values passed from the command line

I am writing a Python script to create directories for a given term and course. I would like to make use of the Python modules os, sys, and getopt (with both short and long form options) so that the running script would look like:
>python directory.py –t fall2013 –c cs311-400
>python directory.py –-term fall2013 –-class cs311-400
The code that I have write now looks like this:
import os
import sys
import getopt
term = ""
course = ""
options, args = getopt.getopt(sys.argv[1:], 't:c:', ['term=', 'course='])
for opt, arg in options:
if opt in ('-t', '--term'):
term = arg
elif opt in ('-c', '--course'):
course = arg
After this, I have a function that takes in the term and course an uses os.mkdir and such:
def make_folders(term, course):
if not os.path.isdir(term + course):
os.mkdir(term + course)
path = os.path.join(term + course, "assignments")
os.makedirs(path)
path = os.path.join(term + course, "examples")
os.makedirs(path)
path = os.path.join(term + course, "exams")
os.makedirs(path)
path = os.path.join(term + course, "lecture_notes")
os.makedirs(path)
path = os.path.join(term + course, "submissions")
os.makedirs(path)
make_folders(term, course)
For some reason, the folder that gets made only has a name that represents the term rather than both the term and the course. I feel like this might have something to do with my use of getopt, but I'm not certain. Any advice?
os.path.join is a clever function. Just pass as many folders as you need:
>>> import os
>>> os.path.join("first", "second", "third")
'first/second/third'
When you write term + course, Python concatenates the strings in term and course directly, before os.path.join() even sees them. That is, if, say, term == "fall2013" and course == "cs311-400", then term + course == "fall2013cs311-400" with nothing in between.
One way around that would be to insert an explicit slash between the term and the course, as in term + "/" + course. However, since you've presumably been instructed to use os.path.join() (which is a good idea, anyway), you can just pass all the path components you want to join to it as separate arguments and let it take care of joining them for you:
path = os.path.join(term, course, "exams")
Also, a few tips for your assignment, and for good Python coding in general:
While the getopt module is not actually deprecated like rtrwalker claims in the comments, you're probably better off using argparse unless you have to use getopt for some reason (like, say, the assignment tells you to).
Your code looks very repetitive. Repetitive code is a "smell" that should suggest the need for a loop, perhaps like this:
dirs = ("assignments", "examples", "exams", "lecture_notes", "submissions")
for folder in dirs:
path = os.path.join(term, course, folder)
os.makedirs(path)
I'm actually in your class; at least I'm almost positive I am. Was running into this exact problem and was having trouble getting that sub-directory. A google search landed me here! Well, after a few more googles and LOTS of troubleshooting, found the answer to our problem!
os.makedirs(''+term+'/'+course+'')
path = os.path.join(''+term+'/'+course+'', "exams")
os.makedirs(path)
That should clean it up for you and give your your new directory and sub-directories! Good luck with the rest of the assignment.

Is there a more Pythonic approach to this?

This is my first python script, be ye warned.
I pieced this together from Dive Into Python, and it works great. However since it is my first Python script I would appreciate any tips on how it can be made better or approaches that may better embrace the Python way of programming.
import os
import shutil
def getSourceDirectory():
"""Get the starting source path of folders/files to backup"""
return "/Users/robert/Music/iTunes/iTunes Media/"
def getDestinationDirectory():
"""Get the starting destination path for backup"""
return "/Users/robert/Desktop/Backup/"
def walkDirectory(source, destination):
"""Walk the path and iterate directories and files"""
sourceList = [os.path.normcase(f)
for f in os.listdir(source)]
destinationList = [os.path.normcase(f)
for f in os.listdir(destination)]
for f in sourceList:
sourceItem = os.path.join(source, f)
destinationItem = os.path.join(destination, f)
if os.path.isfile(sourceItem):
"""ignore system files"""
if f.startswith("."):
continue
if not f in destinationList:
"Copying file: " + f
shutil.copyfile(sourceItem, destinationItem)
elif os.path.isdir(sourceItem):
if not f in destinationList:
print "Creating dir: " + f
os.makedirs(destinationItem)
walkDirectory(sourceItem, destinationItem)
"""Make sure starting destination path exists"""
source = getSourceDirectory()
destination = getDestinationDirectory()
if not os.path.exists(destination):
os.makedirs(destination)
walkDirectory(source, destination)
As others mentioned, you probably want to use walk from the built-in os module. Also, consider using PEP 8 compatible style (no camel-case but this_stye_of_function_naming()). Wrapping directly executable code (i.e. no library/module) into a if __name__ == '__main__': ... block is also a good practice.
The code
has no docstring describing what it does
re-invents the "battery" of shutil.copytree
has a function called walkDirectory which doesn't do what its name implies
contains get* functions that provide no utility
those get functions embed high-level arguments deeper than they ought
is obligatorily chatty (print whether you want it or not)
Use os.path.walk. It does most all the bookkeeping for you; you then just feed a visitor function to it to do what you need.
Or, oh damn, looks like os.path.walk has been deprecated. Use os.walk then, and you get
for r, d, f in os.walk('/root/path')
for file in f:
# do something good.
I recommend using os.walk. It does what it looks like you're doing. It offers a nice interface that's easy to utilize to do whatever you need.
The main thing to make things more Pythonic is to adopt Python's PEP8, the style guide. It uses underscore for functions.1
If you're returning a fixed string, e.g. your get* functions, a variable is probably a
better approach. By this, I mean replace your getSourceDirectory with something like this:
source_directory = "/Users/robert/Music/iTunes/iTunes Media/"
Adding the following conditional will mean that code that is specific for running the module as a program does not get called when the module is imported.
if __name__ == '__main__':
source = getSourceDirectory()
destination = getDestinationDirectory()
if not os.path.exists(destination):
os.makedirs(destination)
walkDirectory(source, destination)
I would use a try & except block, rather than a conditional to test if walkDirectory can operate successfully. Weird things can happen with multiple processes & filesystems:
try:
walkDirectory(source, destination)
except IOError:
os.makedirs(destination)
walkDirectory(source, destination)
1 I've left out discussion about whether to use the standard library. At this stage of your Python journey, I think you're just after a feel how the language should be used in general terms. I don't think knowing the details of os.walk is really that important right now.

Categories