Is there a more Pythonic approach to this?

Is there a more Pythonic approach to this? - python

This is my first python script, be ye warned.
I pieced this together from Dive Into Python, and it works great. However since it is my first Python script I would appreciate any tips on how it can be made better or approaches that may better embrace the Python way of programming.
import os
import shutil
def getSourceDirectory():
"""Get the starting source path of folders/files to backup"""
return "/Users/robert/Music/iTunes/iTunes Media/"
def getDestinationDirectory():
"""Get the starting destination path for backup"""
return "/Users/robert/Desktop/Backup/"
def walkDirectory(source, destination):
"""Walk the path and iterate directories and files"""
sourceList = [os.path.normcase(f)
for f in os.listdir(source)]
destinationList = [os.path.normcase(f)
for f in os.listdir(destination)]
for f in sourceList:
sourceItem = os.path.join(source, f)
destinationItem = os.path.join(destination, f)
if os.path.isfile(sourceItem):
"""ignore system files"""
if f.startswith("."):
continue
if not f in destinationList:
"Copying file: " + f
shutil.copyfile(sourceItem, destinationItem)
elif os.path.isdir(sourceItem):
if not f in destinationList:
print "Creating dir: " + f
os.makedirs(destinationItem)
walkDirectory(sourceItem, destinationItem)
"""Make sure starting destination path exists"""
source = getSourceDirectory()
destination = getDestinationDirectory()
if not os.path.exists(destination):
os.makedirs(destination)
walkDirectory(source, destination)

As others mentioned, you probably want to use walk from the built-in os module. Also, consider using PEP 8 compatible style (no camel-case but this_stye_of_function_naming()). Wrapping directly executable code (i.e. no library/module) into a if __name__ == '__main__': ... block is also a good practice.

The code
has no docstring describing what it does
re-invents the "battery" of shutil.copytree
has a function called walkDirectory which doesn't do what its name implies
contains get* functions that provide no utility
those get functions embed high-level arguments deeper than they ought
is obligatorily chatty (print whether you want it or not)

Use os.path.walk. It does most all the bookkeeping for you; you then just feed a visitor function to it to do what you need.
Or, oh damn, looks like os.path.walk has been deprecated. Use os.walk then, and you get
for r, d, f in os.walk('/root/path')
for file in f:
# do something good.

I recommend using os.walk. It does what it looks like you're doing. It offers a nice interface that's easy to utilize to do whatever you need.

The main thing to make things more Pythonic is to adopt Python's PEP8, the style guide. It uses underscore for functions.1
If you're returning a fixed string, e.g. your get* functions, a variable is probably a
better approach. By this, I mean replace your getSourceDirectory with something like this:
source_directory = "/Users/robert/Music/iTunes/iTunes Media/"
Adding the following conditional will mean that code that is specific for running the module as a program does not get called when the module is imported.
if __name__ == '__main__':
source = getSourceDirectory()
destination = getDestinationDirectory()
if not os.path.exists(destination):
os.makedirs(destination)
walkDirectory(source, destination)
I would use a try & except block, rather than a conditional to test if walkDirectory can operate successfully. Weird things can happen with multiple processes & filesystems:
try:
walkDirectory(source, destination)
except IOError:
os.makedirs(destination)
walkDirectory(source, destination)
1 I've left out discussion about whether to use the standard library. At this stage of your Python journey, I think you're just after a feel how the language should be used in general terms. I don't think knowing the details of os.walk is really that important right now.

Related

Proper position of imports & definitions in called scripts?

I have one script that calls another - let's call them master.py and fetch.py. I suppose the second script could be integrated into the first, but it does have distinct functionality - so keeping them separate seems like a good way to force myself to learn how to call outside scripts.
Here's the basic structure of fetch.py:
<import block>
infiles = <paths>
arcpy.env.workspace = os.path.dirname(infile)
ws = arcpy.env.workspace
newfile_list = []
def main():
name = <name>
if not arcpy.Exists(name + ".gdb"):
global ws
new_gdb = DM.CreateFileGDB(ws, name + ".gdb")
newfile_list.append(new_gdb)
other_func1()
other_func2()
print "\nNew files from fetch.py:"
for i in newfile_list:
print " " + i
def other_func1():
stuff
def other_func2():
stuff
if __name__ == '__main__':
main()
And master.py:
<import block>
infiles = <paths>
def f1():
stuff
def f2():
stuff
import fetch
fetch.main()
f1()
f2()
The problems, concerning placement of the import block & file definitions of fetch.py:
When I put them inside main() and run it as a standalone script, my arcpy functions don't work because my various imports haven't run yet. Putting them ahead of main() and making them global, solves this problem.
But when I put them outside of main() as you see here, I get an error saying the local variable ws is referenced before assignment. I think this may have to do with my calling fetch.main(), where the initial lines don't get read. (I was able to make it work by declaring global ws, but don't know if this is advisable.)
How do I structure fetch.py so that the import statements and file definitions get read, both when run as a standalone script and when called?

Instead of rewriting your code for you, I will try to point you to some ideas from the developer philosophy toolkit to get you on the way how to refine your approach.
My feeling is that you should contemplate YAGNI and KISS when thinking about your code.
In your comment you write:
I'm keeping the 'fetch' script separate because finding feature orientation seems like a useful stand-alone tool down the road.
Like I say: YAGNI and KISS: If you need it as a standalone tool in the future then split it out in the future. For now keep it simple and put the code that belongs together in one module and make it callable exactly the way you need it. When you work with it and understand the problem better and see what else is needed, you can then add other ways of calling your script. There is no reason why you shouldn't keep all the code in one module while it is still manageable and just add different ways to call it.
If the time comes that you want to organize your code into several modules you can refactor it in a way that you pull out the things you do on the module level at the moment (either into functions or maybe even classes). Then you can control exactly what should happen when and you can control the order of things better.
As further reading in regards to structuring a Python project I would recommend the pypa sample project and Starting A Python Project The Right Way. The problems you have with imports will likely cease to exist if you structure your code and your project in a sensible, pythonic way.

Searching filesystem using Python

I have been coding in Python since the last 2 weeks and pretty new to it.
I have written a code to kind of emulate the way "find" command works in *NIX systems. My code works okay-ish for not so deep directories but if I start searching from the "root" directory, it takes too much time and processor heats up :D which on the other hand takes about 8 seconds using "find" cmd.
Hey I know I am kinda noob in Python now but any hint at trying to improve the search efficiency will be greatly appreciated.
Here's what I have written:
#!/usr/bin/python3
import os
class srchx:
file_names = []
is_prohibit = False
def show_result(self):
if(self.is_prohibit):
print("some directories were denied read-access")
print("\nsearch returned {0} result(s)".format(len(self.file_names)))
for _file in self.file_names:
print(_file)
def read_dir(self, cur_dir, srch_name, level):
try:
listing = os.listdir(cur_dir)
except:
self.is_prohibit = True
return
dir_list = []
#print("-"*level+cur_dir)
for entry in listing:
if(os.path.isdir(cur_dir+"/"+entry)):
dir_list.append(entry)
else:
if(srch_name == entry):
self.file_names.append(cur_dir+"/"+entry)
for _dir in dir_list:
new_dir = cur_dir + "/" + _dir
self.read_dir(new_dir, srch_name, level+1)
if(level == 0):
self.show_result()
def __init__(self, dir_name=os.getcwd()):
srch_name = ""
while(len(srch_name) == 0):
srch_name = input("search for: ")
self.read_dir(dir_name, srch_name, 0)
def main():
srch = srchx()
if (__name__ == "__main__"):
main()
Take a look at and please help me to solve this issue.

There is a built-in Directory-Browsing Framework called os.walk() but even os.walk() is slow, if you want to browse faster, you need access to the operating systems file-browser.
https://pypi.python.org/pypi/scandir
scandir is a solution.

What user1767754 said. You can't really improve the speed much using the methods you're calling. os.walk() is a bit more efficient, though. I've never used scandir (or pypi) so I can't comment.
BTW, that's rather good looking code for a noob, Marty! But there are a couple of issues with it.
It's not a good idea to initialise file_names and is_prohibit like that because it makes them class variables; initialise them in __init__.
You should read srch_name outside the class and pass it your class constructor. You do that by making it an arg of __init__, as described in the link above.
It's generally good policy to handle user input in the outermost parts of your code (when practical) rather than doing it in the inner parts of your code. I like to think of my user input routines as border guards that only let good input into the inner sanctum of my code. Users are unpredictable critters and there's no telling what mischief they'll get up to. :)

Using getopt() to get values passed from the command line

I am writing a Python script to create directories for a given term and course. I would like to make use of the Python modules os, sys, and getopt (with both short and long form options) so that the running script would look like:
>python directory.py –t fall2013 –c cs311-400
>python directory.py –-term fall2013 –-class cs311-400
The code that I have write now looks like this:
import os
import sys
import getopt
term = ""
course = ""
options, args = getopt.getopt(sys.argv[1:], 't:c:', ['term=', 'course='])
for opt, arg in options:
if opt in ('-t', '--term'):
term = arg
elif opt in ('-c', '--course'):
course = arg
After this, I have a function that takes in the term and course an uses os.mkdir and such:
def make_folders(term, course):
if not os.path.isdir(term + course):
os.mkdir(term + course)
path = os.path.join(term + course, "assignments")
os.makedirs(path)
path = os.path.join(term + course, "examples")
os.makedirs(path)
path = os.path.join(term + course, "exams")
os.makedirs(path)
path = os.path.join(term + course, "lecture_notes")
os.makedirs(path)
path = os.path.join(term + course, "submissions")
os.makedirs(path)
make_folders(term, course)
For some reason, the folder that gets made only has a name that represents the term rather than both the term and the course. I feel like this might have something to do with my use of getopt, but I'm not certain. Any advice?

os.path.join is a clever function. Just pass as many folders as you need:
>>> import os
>>> os.path.join("first", "second", "third")
'first/second/third'

When you write term + course, Python concatenates the strings in term and course directly, before os.path.join() even sees them. That is, if, say, term == "fall2013" and course == "cs311-400", then term + course == "fall2013cs311-400" with nothing in between.
One way around that would be to insert an explicit slash between the term and the course, as in term + "/" + course. However, since you've presumably been instructed to use os.path.join() (which is a good idea, anyway), you can just pass all the path components you want to join to it as separate arguments and let it take care of joining them for you:
path = os.path.join(term, course, "exams")
Also, a few tips for your assignment, and for good Python coding in general:
While the getopt module is not actually deprecated like rtrwalker claims in the comments, you're probably better off using argparse unless you have to use getopt for some reason (like, say, the assignment tells you to).
Your code looks very repetitive. Repetitive code is a "smell" that should suggest the need for a loop, perhaps like this:
dirs = ("assignments", "examples", "exams", "lecture_notes", "submissions")
for folder in dirs:
path = os.path.join(term, course, folder)
os.makedirs(path)

I'm actually in your class; at least I'm almost positive I am. Was running into this exact problem and was having trouble getting that sub-directory. A google search landed me here! Well, after a few more googles and LOTS of troubleshooting, found the answer to our problem!
os.makedirs(''+term+'/'+course+'')
path = os.path.join(''+term+'/'+course+'', "exams")
os.makedirs(path)
That should clean it up for you and give your your new directory and sub-directories! Good luck with the rest of the assignment.

Check if file system is case-insensitive in Python

Is there a simple way to check in Python if a file system is case insensitive? I'm thinking in particular of file systems like HFS+ (OSX) and NTFS (Windows), where you can access the same file as foo, Foo or FOO, even though the file case is preserved.

import os
import tempfile
# By default mkstemp() creates a file with
# a name that begins with 'tmp' (lowercase)
tmphandle, tmppath = tempfile.mkstemp()
if os.path.exists(tmppath.upper()):
# Case insensitive.
else:
# Case sensitive.

The answer provided by Amber will leave temporary file debris unless closing and deleting are handled explicitly. To avoid this I use:
import os
import tempfile
def is_fs_case_sensitive():
#
# Force case with the prefix
#
with tempfile.NamedTemporaryFile(prefix='TmP') as tmp_file:
return(not os.path.exists(tmp_file.name.lower()))
Though my usage cases generally test this more than once, so I stash the result to avoid having to touch the filesystem more than once.
def is_fs_case_sensitive():
if not hasattr(is_fs_case_sensitive, 'case_sensitive'):
with tempfile.NamedTemporaryFile(prefix='TmP') as tmp_file:
setattr(is_fs_case_sensitive,
'case_sensitive',
not os.path.exists(tmp_file.name.lower()))
return(is_fs_case_sensitive.case_sensitive)
Which is marginally slower if only called once, and significantly faster in every other case.

Good point on the different file systems, etc., Eric Smith. But why not use tempfile.NamedTemporaryFile with the dir parameter and avoid doing all that context manager lifting yourself?
def is_fs_case_sensitive(path):
#
# Force case with the prefix
#
with tempfile.NamedTemporaryFile(prefix='TmP',dir=path, delete=True) as tmp_file:
return(not os.path.exists(tmp_file.name.lower()))
I should also mention that your solution does not guarantee that you are actually testing for case sensitivity. Unless you check the default prefix (using tempfile.gettempprefix()) to make sure it contains a lower-case character. So including the prefix here is not really optional.
Your solution cleans up the temp file. I agree that it seemed obvious, but one never knows, do one?

Variation on #Shrikant's answer, applicable within a module (i.e. not in the REPL), even if your user doesn't have a home:
import os.path
is_fs_case_insensitive = os.path.exists(__file__.upper()) and os.path.exists(__file__.lower())
print(f"{is_fs_case_insensitive=}")
output (macOS):
is_fs_case_insensitive=True 👈
And the Linux side of things:
(ssha)vagrant ~$python3.8 test.py
is_fs_case_insensitive=False 👈
(ssha)vagrant ~$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04
Codename: focal
FWIW, I checked pathlib, os, os.path's contents via:
[k for k in vars(pathlib).keys() if "case" in k.lower()]
and nothing looks like it, though it does have a pathlib.supports_symlinks but nothing about case-sensitivity.
And the following will work in the REPL as well:
is_fs_case_insensitive = os.path.exists(os.path.__file__.upper()) and os.path.exists(os.path.__file__.lower())

Starting with Amber's answer, I came up with this code. I'm not sure it is totally robust, but it attempts to address some issues in the original (that I'll mention below).
import os
import sys
import tempfile
import contextlib
def is_case_sensitive(path):
with temp(path) as tmppath:
head, tail = os.path.split(tmppath)
testpath = os.path.join(head, tail.upper())
return not os.path.exists(testpath)
#contextlib.contextmanager
def temp(path):
tmphandle, tmppath = tempfile.mkstemp(dir=path)
os.close(tmphandle)
try:
yield tmppath
finally:
os.unlink(tmppath)
if __name__ == '__main__':
path = os.path.abspath(sys.argv[1])
print(path)
print('Case sensitive: ' + str(is_case_sensitive(path)))
Without specifying the dir parameter in mkstemp, the question of case sensitivity is vague. You're testing case sensitivity of wherever the temporary directory happens to be, but you may want to know about a specific path.
If you convert the full path returned from mkstemp to upper-case, you could potentially miss a transition somewhere in the path. For example, I have a USB flash drive on Linux mounted using vfat at /media/FLASH. Testing the existence of anything under /MEDIA/FLASH will always fail because /media is on a (case-sensitive) ext4 partition, but the flash drive itself is case-insensitive. Mounted network shares could be another situation like this.
Finally, and maybe it goes without saying in Amber's answer, you'll want to clean up the temp file created by mkstemp.

I think there's a much simpler (and probably faster) solution to this. The following seemed to be working for where I tested:
import os.path
home = os.path.expanduser('~')
is_fs_case_insensitive = os.path.exists(home.upper()) and os.path.exists(home.lower())

import os
if os.path.normcase('A') == os.path.normcase('a'):
# case insensitive
else:
# case sensitive

I think we can do this in one line with pathlib on Python 3.5+ without creating temporary files:
from pathlib import Path
def is_case_insensitive(path) -> bool:
return Path(str(Path.home()).upper()).exists()
Or for the inverse:
def is_case_sensitive(path) -> bool:
return not Path(str(Path.home()).upper()).exists()

I believe this to be the simplest solution to the question:
from fnmatch import fnmatch
os_is_case_insensitive = fnmatch('A','a')
From: https://docs.python.org/3.4/library/fnmatch.html
If the operating system is case-insensitive, then both parameters will
be normalized to all lower- or upper-case before the comparison is
performed.

os.path.basename works with URLs, why?

>>> os.path.basename('http://example.com/file.txt')
'file.txt'
.. and I thought os.path.* work only on local paths and not URLs? Note that the above example was run on Windows too .. with similar result.

In practice many functions of os.path are just string manipulation functions (which just happen to be especially handy for path manipulation) -- and since that's innocuous and occasionally handy, while formally speaking "incorrect", I doubt this will change anytime soon -- for more details, use the following simple one-liner at a shell/command prompt:
$ python -c"import sys; import StringIO; x = StringIO.StringIO(); sys.stdout = x; import this; sys.stdout = sys.__stdout__; print x.getvalue().splitlines()[10][9:]"
Or, for Python 3:
$ python -c"import sys; import io; x = io.StringIO(); sys.stdout = x; import this; sys.stdout = sys.__stdout__; print(x.getvalue().splitlines()[10][9:])"

On windows, look at the source code: C:\Python25\Lib\ntpath.py
def basename(p):
"""Returns the final component of a pathname"""
return split(p)[1]
os.path.split (in the same file) just split "\" (and sth. else)

Beware of URLs with parameters, anchors or anything that isn't a "plain" URL:
>>> import os.path
>>> os.path.basename("protocol://fully.qualifie.host/path/to/file.txt")
'file.txt'
>>> os.path.basename("protocol://fully.qualifie.host/path/to/file.txt?param1&param1#anchor")
'file.txt?param1&param1#anchor'

Use the source Luke:
def basename(p):
"""Returns the final component of a pathname"""
i = p.rfind('/') + 1
return p[i:]
Edit (response to clarification):
It works for URLs by accident, that's it. Because of that, exploiting its behaviour could be considered code smell by some.
Trying to "fix" it (check if passed path is not url) is also surprisingly difficult
www.google.com/test.php
me#other.place.com/12
./src/bin/doc/goto.c
are at the same time correct pathnames and URLs (relative), so is the http:/hello.txt (one /, and only on linux, and it's kinda stupid :)). You could "fix" it for absolute urls but relative ones will still work. Handling one special case in differently is a big no no in the python world.
To sum it up: import this

Forward slash is also an acceptable path delimiter in Windows.
It is merely that the command line does not accept paths that begin with a / because that character is reserved for args switches.

Why? Because it's useful for parsing URLs as well as local file paths. Why not?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.