Using getopt() to get values passed from the command line - python

I am writing a Python script to create directories for a given term and course. I would like to make use of the Python modules os, sys, and getopt (with both short and long form options) so that the running script would look like:
>python directory.py –t fall2013 –c cs311-400
>python directory.py –-term fall2013 –-class cs311-400
The code that I have write now looks like this:
import os
import sys
import getopt
term = ""
course = ""
options, args = getopt.getopt(sys.argv[1:], 't:c:', ['term=', 'course='])
for opt, arg in options:
if opt in ('-t', '--term'):
term = arg
elif opt in ('-c', '--course'):
course = arg
After this, I have a function that takes in the term and course an uses os.mkdir and such:
def make_folders(term, course):
if not os.path.isdir(term + course):
os.mkdir(term + course)
path = os.path.join(term + course, "assignments")
os.makedirs(path)
path = os.path.join(term + course, "examples")
os.makedirs(path)
path = os.path.join(term + course, "exams")
os.makedirs(path)
path = os.path.join(term + course, "lecture_notes")
os.makedirs(path)
path = os.path.join(term + course, "submissions")
os.makedirs(path)
make_folders(term, course)
For some reason, the folder that gets made only has a name that represents the term rather than both the term and the course. I feel like this might have something to do with my use of getopt, but I'm not certain. Any advice?

os.path.join is a clever function. Just pass as many folders as you need:
>>> import os
>>> os.path.join("first", "second", "third")
'first/second/third'

When you write term + course, Python concatenates the strings in term and course directly, before os.path.join() even sees them. That is, if, say, term == "fall2013" and course == "cs311-400", then term + course == "fall2013cs311-400" with nothing in between.
One way around that would be to insert an explicit slash between the term and the course, as in term + "/" + course. However, since you've presumably been instructed to use os.path.join() (which is a good idea, anyway), you can just pass all the path components you want to join to it as separate arguments and let it take care of joining them for you:
path = os.path.join(term, course, "exams")
Also, a few tips for your assignment, and for good Python coding in general:
While the getopt module is not actually deprecated like rtrwalker claims in the comments, you're probably better off using argparse unless you have to use getopt for some reason (like, say, the assignment tells you to).
Your code looks very repetitive. Repetitive code is a "smell" that should suggest the need for a loop, perhaps like this:
dirs = ("assignments", "examples", "exams", "lecture_notes", "submissions")
for folder in dirs:
path = os.path.join(term, course, folder)
os.makedirs(path)

I'm actually in your class; at least I'm almost positive I am. Was running into this exact problem and was having trouble getting that sub-directory. A google search landed me here! Well, after a few more googles and LOTS of troubleshooting, found the answer to our problem!
os.makedirs(''+term+'/'+course+'')
path = os.path.join(''+term+'/'+course+'', "exams")
os.makedirs(path)
That should clean it up for you and give your your new directory and sub-directories! Good luck with the rest of the assignment.

Related

Is there a graceful way to use os.path.join() when the right-hand side may be /-prefixed?

In the code below, context._arguments['ConfigFile'] returns a string like '/path/file.py' (which I can't change) but due to the way os.path.join() works, I need to remove at bare minimum the first /.
Note: In my use case __file__ will always be in the appropriate position away from the config file.
I also considered giving it context._arguments['ConfigFile'][1:] but I think it's less robust.
config_file = os.path.join(
os.path.dirname(os.path.abspath(__file__)),
*context._arguments['ConfigFile'].split(os.path.sep))
I expected there to be something a little more graceful, but maybe handling paths just never is. I am using Python 2.7 but for completeness I'm open to hearing Python 3 answers.
If you use Python 3, you can benefit from the pathlib package:
from pathlib import Path
file_path = '/path/file.py'
config_file = Path(__file__).parent / file_path.lstrip('/')
print(config_file)
# /Users/darius/repos/stackoverflow/questions/path/file.py
If you use Python 2, you can install pathlib2 (pip install pathlib2) which is a backport of the standard pathlib package. To match the module names you can rename the import with import pathlib2 as pathlib.
(This is a response to a comment, really, but needs formatting.)
>>> os.path.join('/a', '/b/c')
'/b/c'
>>> os.path.join('/a', './/b/c')
'/a/.//b/c'
Use os.path.normpath to clean up:
>>> os.path.normpath(os.path.join('/a', './/b/c'))
'/a/b/c'
The other way to view this is that, at least on Unix systems, os.path.join starts with its first argument. Then, for each additional argument, it either concatenates or replaces using the return-value-so-far and the extra path component:
def unix_style_join(*args):
"low quality version, for illustration"
ret = args[0]
for extra in args[1:]:
if extra.startswith('/'):
ret = extra
else:
ret = ret + '/' + extra
return ret
Since your problem is that context._arguments['ConfigFile'] starts with /, we merely need a variant of context._arguments['ConfigFile'] that means the same thing but does not start with / ... and ./<whatever> means the same as <whatever> except that ./<whatever> starts with ., even if <whatever> starts with /.
The reason I didn't suggest this as the whole answer is that I have no idea how this all works on Windows.

Searching filesystem using Python

I have been coding in Python since the last 2 weeks and pretty new to it.
I have written a code to kind of emulate the way "find" command works in *NIX systems. My code works okay-ish for not so deep directories but if I start searching from the "root" directory, it takes too much time and processor heats up :D which on the other hand takes about 8 seconds using "find" cmd.
Hey I know I am kinda noob in Python now but any hint at trying to improve the search efficiency will be greatly appreciated.
Here's what I have written:
#!/usr/bin/python3
import os
class srchx:
file_names = []
is_prohibit = False
def show_result(self):
if(self.is_prohibit):
print("some directories were denied read-access")
print("\nsearch returned {0} result(s)".format(len(self.file_names)))
for _file in self.file_names:
print(_file)
def read_dir(self, cur_dir, srch_name, level):
try:
listing = os.listdir(cur_dir)
except:
self.is_prohibit = True
return
dir_list = []
#print("-"*level+cur_dir)
for entry in listing:
if(os.path.isdir(cur_dir+"/"+entry)):
dir_list.append(entry)
else:
if(srch_name == entry):
self.file_names.append(cur_dir+"/"+entry)
for _dir in dir_list:
new_dir = cur_dir + "/" + _dir
self.read_dir(new_dir, srch_name, level+1)
if(level == 0):
self.show_result()
def __init__(self, dir_name=os.getcwd()):
srch_name = ""
while(len(srch_name) == 0):
srch_name = input("search for: ")
self.read_dir(dir_name, srch_name, 0)
def main():
srch = srchx()
if (__name__ == "__main__"):
main()
Take a look at and please help me to solve this issue.
There is a built-in Directory-Browsing Framework called os.walk() but even os.walk() is slow, if you want to browse faster, you need access to the operating systems file-browser.
https://pypi.python.org/pypi/scandir
scandir is a solution.
What user1767754 said. You can't really improve the speed much using the methods you're calling. os.walk() is a bit more efficient, though. I've never used scandir (or pypi) so I can't comment.
BTW, that's rather good looking code for a noob, Marty! But there are a couple of issues with it.
It's not a good idea to initialise file_names and is_prohibit like that because it makes them class variables; initialise them in __init__.
You should read srch_name outside the class and pass it your class constructor. You do that by making it an arg of __init__, as described in the link above.
It's generally good policy to handle user input in the outermost parts of your code (when practical) rather than doing it in the inner parts of your code. I like to think of my user input routines as border guards that only let good input into the inner sanctum of my code. Users are unpredictable critters and there's no telling what mischief they'll get up to. :)

Check if file system is case-insensitive in Python

Is there a simple way to check in Python if a file system is case insensitive? I'm thinking in particular of file systems like HFS+ (OSX) and NTFS (Windows), where you can access the same file as foo, Foo or FOO, even though the file case is preserved.
import os
import tempfile
# By default mkstemp() creates a file with
# a name that begins with 'tmp' (lowercase)
tmphandle, tmppath = tempfile.mkstemp()
if os.path.exists(tmppath.upper()):
# Case insensitive.
else:
# Case sensitive.
The answer provided by Amber will leave temporary file debris unless closing and deleting are handled explicitly. To avoid this I use:
import os
import tempfile
def is_fs_case_sensitive():
#
# Force case with the prefix
#
with tempfile.NamedTemporaryFile(prefix='TmP') as tmp_file:
return(not os.path.exists(tmp_file.name.lower()))
Though my usage cases generally test this more than once, so I stash the result to avoid having to touch the filesystem more than once.
def is_fs_case_sensitive():
if not hasattr(is_fs_case_sensitive, 'case_sensitive'):
with tempfile.NamedTemporaryFile(prefix='TmP') as tmp_file:
setattr(is_fs_case_sensitive,
'case_sensitive',
not os.path.exists(tmp_file.name.lower()))
return(is_fs_case_sensitive.case_sensitive)
Which is marginally slower if only called once, and significantly faster in every other case.
Good point on the different file systems, etc., Eric Smith. But why not use tempfile.NamedTemporaryFile with the dir parameter and avoid doing all that context manager lifting yourself?
def is_fs_case_sensitive(path):
#
# Force case with the prefix
#
with tempfile.NamedTemporaryFile(prefix='TmP',dir=path, delete=True) as tmp_file:
return(not os.path.exists(tmp_file.name.lower()))
I should also mention that your solution does not guarantee that you are actually testing for case sensitivity. Unless you check the default prefix (using tempfile.gettempprefix()) to make sure it contains a lower-case character. So including the prefix here is not really optional.
Your solution cleans up the temp file. I agree that it seemed obvious, but one never knows, do one?
Variation on #Shrikant's answer, applicable within a module (i.e. not in the REPL), even if your user doesn't have a home:
import os.path
is_fs_case_insensitive = os.path.exists(__file__.upper()) and os.path.exists(__file__.lower())
print(f"{is_fs_case_insensitive=}")
output (macOS):
is_fs_case_insensitive=True 👈
And the Linux side of things:
(ssha)vagrant ~$python3.8 test.py
is_fs_case_insensitive=False 👈
(ssha)vagrant ~$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04 LTS
Release: 20.04
Codename: focal
FWIW, I checked pathlib, os, os.path's contents via:
[k for k in vars(pathlib).keys() if "case" in k.lower()]
and nothing looks like it, though it does have a pathlib.supports_symlinks but nothing about case-sensitivity.
And the following will work in the REPL as well:
is_fs_case_insensitive = os.path.exists(os.path.__file__.upper()) and os.path.exists(os.path.__file__.lower())
Starting with Amber's answer, I came up with this code. I'm not sure it is totally robust, but it attempts to address some issues in the original (that I'll mention below).
import os
import sys
import tempfile
import contextlib
def is_case_sensitive(path):
with temp(path) as tmppath:
head, tail = os.path.split(tmppath)
testpath = os.path.join(head, tail.upper())
return not os.path.exists(testpath)
#contextlib.contextmanager
def temp(path):
tmphandle, tmppath = tempfile.mkstemp(dir=path)
os.close(tmphandle)
try:
yield tmppath
finally:
os.unlink(tmppath)
if __name__ == '__main__':
path = os.path.abspath(sys.argv[1])
print(path)
print('Case sensitive: ' + str(is_case_sensitive(path)))
Without specifying the dir parameter in mkstemp, the question of case sensitivity is vague. You're testing case sensitivity of wherever the temporary directory happens to be, but you may want to know about a specific path.
If you convert the full path returned from mkstemp to upper-case, you could potentially miss a transition somewhere in the path. For example, I have a USB flash drive on Linux mounted using vfat at /media/FLASH. Testing the existence of anything under /MEDIA/FLASH will always fail because /media is on a (case-sensitive) ext4 partition, but the flash drive itself is case-insensitive. Mounted network shares could be another situation like this.
Finally, and maybe it goes without saying in Amber's answer, you'll want to clean up the temp file created by mkstemp.
I think there's a much simpler (and probably faster) solution to this. The following seemed to be working for where I tested:
import os.path
home = os.path.expanduser('~')
is_fs_case_insensitive = os.path.exists(home.upper()) and os.path.exists(home.lower())
import os
if os.path.normcase('A') == os.path.normcase('a'):
# case insensitive
else:
# case sensitive
I think we can do this in one line with pathlib on Python 3.5+ without creating temporary files:
from pathlib import Path
def is_case_insensitive(path) -> bool:
return Path(str(Path.home()).upper()).exists()
Or for the inverse:
def is_case_sensitive(path) -> bool:
return not Path(str(Path.home()).upper()).exists()
I believe this to be the simplest solution to the question:
from fnmatch import fnmatch
os_is_case_insensitive = fnmatch('A','a')
From: https://docs.python.org/3.4/library/fnmatch.html
If the operating system is case-insensitive, then both parameters will
be normalized to all lower- or upper-case before the comparison is
performed.

Is there a more Pythonic approach to this?

This is my first python script, be ye warned.
I pieced this together from Dive Into Python, and it works great. However since it is my first Python script I would appreciate any tips on how it can be made better or approaches that may better embrace the Python way of programming.
import os
import shutil
def getSourceDirectory():
"""Get the starting source path of folders/files to backup"""
return "/Users/robert/Music/iTunes/iTunes Media/"
def getDestinationDirectory():
"""Get the starting destination path for backup"""
return "/Users/robert/Desktop/Backup/"
def walkDirectory(source, destination):
"""Walk the path and iterate directories and files"""
sourceList = [os.path.normcase(f)
for f in os.listdir(source)]
destinationList = [os.path.normcase(f)
for f in os.listdir(destination)]
for f in sourceList:
sourceItem = os.path.join(source, f)
destinationItem = os.path.join(destination, f)
if os.path.isfile(sourceItem):
"""ignore system files"""
if f.startswith("."):
continue
if not f in destinationList:
"Copying file: " + f
shutil.copyfile(sourceItem, destinationItem)
elif os.path.isdir(sourceItem):
if not f in destinationList:
print "Creating dir: " + f
os.makedirs(destinationItem)
walkDirectory(sourceItem, destinationItem)
"""Make sure starting destination path exists"""
source = getSourceDirectory()
destination = getDestinationDirectory()
if not os.path.exists(destination):
os.makedirs(destination)
walkDirectory(source, destination)
As others mentioned, you probably want to use walk from the built-in os module. Also, consider using PEP 8 compatible style (no camel-case but this_stye_of_function_naming()). Wrapping directly executable code (i.e. no library/module) into a if __name__ == '__main__': ... block is also a good practice.
The code
has no docstring describing what it does
re-invents the "battery" of shutil.copytree
has a function called walkDirectory which doesn't do what its name implies
contains get* functions that provide no utility
those get functions embed high-level arguments deeper than they ought
is obligatorily chatty (print whether you want it or not)
Use os.path.walk. It does most all the bookkeeping for you; you then just feed a visitor function to it to do what you need.
Or, oh damn, looks like os.path.walk has been deprecated. Use os.walk then, and you get
for r, d, f in os.walk('/root/path')
for file in f:
# do something good.
I recommend using os.walk. It does what it looks like you're doing. It offers a nice interface that's easy to utilize to do whatever you need.
The main thing to make things more Pythonic is to adopt Python's PEP8, the style guide. It uses underscore for functions.1
If you're returning a fixed string, e.g. your get* functions, a variable is probably a
better approach. By this, I mean replace your getSourceDirectory with something like this:
source_directory = "/Users/robert/Music/iTunes/iTunes Media/"
Adding the following conditional will mean that code that is specific for running the module as a program does not get called when the module is imported.
if __name__ == '__main__':
source = getSourceDirectory()
destination = getDestinationDirectory()
if not os.path.exists(destination):
os.makedirs(destination)
walkDirectory(source, destination)
I would use a try & except block, rather than a conditional to test if walkDirectory can operate successfully. Weird things can happen with multiple processes & filesystems:
try:
walkDirectory(source, destination)
except IOError:
os.makedirs(destination)
walkDirectory(source, destination)
1 I've left out discussion about whether to use the standard library. At this stage of your Python journey, I think you're just after a feel how the language should be used in general terms. I don't think knowing the details of os.walk is really that important right now.

os.path.basename works with URLs, why?

>>> os.path.basename('http://example.com/file.txt')
'file.txt'
.. and I thought os.path.* work only on local paths and not URLs? Note that the above example was run on Windows too .. with similar result.
In practice many functions of os.path are just string manipulation functions (which just happen to be especially handy for path manipulation) -- and since that's innocuous and occasionally handy, while formally speaking "incorrect", I doubt this will change anytime soon -- for more details, use the following simple one-liner at a shell/command prompt:
$ python -c"import sys; import StringIO; x = StringIO.StringIO(); sys.stdout = x; import this; sys.stdout = sys.__stdout__; print x.getvalue().splitlines()[10][9:]"
Or, for Python 3:
$ python -c"import sys; import io; x = io.StringIO(); sys.stdout = x; import this; sys.stdout = sys.__stdout__; print(x.getvalue().splitlines()[10][9:])"
On windows, look at the source code: C:\Python25\Lib\ntpath.py
def basename(p):
"""Returns the final component of a pathname"""
return split(p)[1]
os.path.split (in the same file) just split "\" (and sth. else)
Beware of URLs with parameters, anchors or anything that isn't a "plain" URL:
>>> import os.path
>>> os.path.basename("protocol://fully.qualifie.host/path/to/file.txt")
'file.txt'
>>> os.path.basename("protocol://fully.qualifie.host/path/to/file.txt?param1&param1#anchor")
'file.txt?param1&param1#anchor'
Use the source Luke:
def basename(p):
"""Returns the final component of a pathname"""
i = p.rfind('/') + 1
return p[i:]
Edit (response to clarification):
It works for URLs by accident, that's it. Because of that, exploiting its behaviour could be considered code smell by some.
Trying to "fix" it (check if passed path is not url) is also surprisingly difficult
www.google.com/test.php
me#other.place.com/12
./src/bin/doc/goto.c
are at the same time correct pathnames and URLs (relative), so is the http:/hello.txt (one /, and only on linux, and it's kinda stupid :)). You could "fix" it for absolute urls but relative ones will still work. Handling one special case in differently is a big no no in the python world.
To sum it up: import this
Forward slash is also an acceptable path delimiter in Windows.
It is merely that the command line does not accept paths that begin with a / because that character is reserved for args switches.
Why? Because it's useful for parsing URLs as well as local file paths. Why not?

Categories