Windows cmd versus bash for sys.argv - Python - python

I was trying to run a python script in visual studio 2015 and I wanted to specify a path to my arparse function, however kept receiving an OSError. See Update the problem appears to be a difference in how argparse receives values from command instead of bash behaviour.
Whether I specify it like this
C:\Users\Sayth\Documents\Racing\XML\*xml
or like this
C:\\Users\\Sayth\Documents\\Racing\\XML\\*xml
I get an OSError that the path is not found
OSError: Error reading file 'C:\\Users\\Sayth\\Documents\\Racing\\XML\\*xml': failed to load external entity "file:/C://Users//Sayth/Documents//Racing//XML//*xml"
Update
I copied the script and XML file to a test directory. From here I have run the script on 2 different shells on windows.
On command cmd
C:\Users\Sayth\Projects
λ python RaceHorse.py XML\*xml
Traceback (most recent call last):
File "RaceHorse.py", line 42, in <module>
tree = lxml.etree.parse(file)
File "lxml.etree.pyx", line 3427, in lxml.etree.parse (src\lxml\lxml.etree.c:79720)
File "parser.pxi", line 1782, in lxml.etree._parseDocument (src\lxml\lxml.etree.c:115914)
File "parser.pxi", line 1808, in lxml.etree._parseDocumentFromURL (src\lxml\lxml.etree.c:116264)
File "parser.pxi", line 1712, in lxml.etree._parseDocFromFile (src\lxml\lxml.etree.c:115152)
File "parser.pxi", line 1115, in lxml.etree._BaseParser._parseDocFromFile (src\lxml\lxml.etree.c:109849)
File "parser.pxi", line 573, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:103323)
File "parser.pxi", line 683, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:104977)
File "parser.pxi", line 611, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:103843)
OSError: Error reading file 'XML\*xml': failed to load external entity "XML/*xml"
When I change it to git bash
It reads the file I get an error however it shows its working.
Sayth#renshaw-laptop ~/Projects
λ python RaceHorse.py XML/*xml
Traceback (most recent call last):
File "RaceHorse.py", line 50, in <module>
nomination_table.append([race_id] + [nomination.attrib[name] for name in horseattrs])
File "RaceHorse.py", line 50, in <listcomp>
nomination_table.append([race_id] + [nomination.attrib[name] for name in horseattrs])
File "lxml.etree.pyx", line 2452, in lxml.etree._Attrib.__getitem__ (src\lxml\lxml.etree.c:68544)
KeyError: 'race_id'
I have a simple argparse function
parser = argparse.ArgumentParser(description=None)
def GetArgs(parser):
"""Parser function using argparse"""
# parser.add_argument('directory', help='directory use',
# action='store', nargs='*')
parser.add_argument("files", nargs="+")
return parser.parse_args()
fileList = GetArgs(parser)
Update 2
Based on comments am trying to implement glob to enable use of windows shells.
glob is returning an error that its object the parser has no object len.
updated glob parser
def GetArgs(parser):
"""Parser function using argparse"""
# parser.add_argument('directory', help='directory use',
# action='store', nargs='*')
parser.add_argument("files", nargs="+")
files = glob.glob(parser.parse_args())
return files
filelist = GetArgs(parser)
Returns this error.
TypeError was unhandled by user code
Message: object of type 'Namespace' has no len()

The following should work with both the Windows cmd shell and bash because it will glob any filenames it receives (which can happen if the shell didn't do it already):
import argparse
from glob import glob
parser = argparse.ArgumentParser(description=None)
def GetArgs(parser):
"""Parser function using argparse"""
parser.add_argument("files", nargs="+")
namespace = parser.parse_args()
files = [filename for filespec in namespace.files for filename in glob(filespec)]
return files
filelist = GetArgs(parser)
However, I don't think having GetArgs() add arguments to the parser it was passed is a good design choice (because it could be an undesirable side-effect if the parser object is reused).

even very short and simple I still consider it worth the answer not only comment because python is multi platform and for that reason when you work with path you should prefer using
from os import path
to avoid problems running your app on different platforms

Related

Python: Can't find file even though file referenced exists

I'm getting this error when trying to run a Python script. Is it saying that it can't find subprocess.py? Because I found it in the location it's listing there, so I doubt that's the issue. What file can't it find?
Traceback (most recent call last):
File "D:\Projects\PythonMathPlots\MandelbrotVideoGenerator.py", line 201, in <module>
run( ['open', 'MandelbrotZoom.mp4'] )
File "C:\Users\Aaron\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 472, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\Aaron\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "C:\Users\Aaron\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 1178, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
You may need to put the full path in the run(...) command, to the open file, and the path to the .mp4 file as well.
Most likely, open does not exist on your system and you have to use the name of the video player software instead.
Make sure the user you're running the script as has read permission for the file.
You may also try with subprocess.Popen(args, shell=True). The use of shell=True may be useful.
Also, use a path defined as path = os.path.join(filepath, filename) and then before passing the path to Popen, assert if os.path.exists(path)==True.
But note that there are some downsides to using shell=True:
Actual meaning of 'shell=True' in subprocess
https://medium.com/python-pandemonium/a-trap-of-shell-true-in-the-subprocess-module-6db7fc66cdfd

Python subprocess FileNotFoundError

I am trying to follow this blog on how to execute an R script from Python. I have the R script working fine from the command line using Rscript.
Here's my Python code:
import subprocess
import os
command = "C:\Program Files\R\R-3.4.4\bin\Rscript"
path2script = os.getcwd() + "\max.R" # gives me the absolute path to the R script
args = ["11", "3", "9", "42"]
cmd = [command, path2script] + args
x = subprocess.check_output(cmd, universal_newlines = True)
Which gives me this error:
FileNotFoundError: [WinError 2] The system cannot find the file specified
I've read a lot of SO posts on this error and in most cases it seems to be a problem with trying to invoke system commands like dir or passing arguments to check_output in the wrong order but in my case I really don't see what should be going wrong.
Following some of the advice I've tried building a string for cmd instead of a list, and then passing it to check_output using the argument shell = True - when I do that I get a CalledProcessError: returned non-zero exit status 1.
I'm assuming this code, which is exactly as it appeared on the blog other than adding the absolute path to the file, is failing now because the behaviour of check_output has changed since 2015...
Can anyone help?
Here's the stack trace:
Traceback (most recent call last):
File "<ipython-input-2-3a0151808726>", line 1, in <module>
runfile('C:/Users/TomWagstaff/Documents/Raising IT/Projects/15 AdWords/Python_R_test/run_max.py', wdir='C:/Users/TomWagstaff/Documents/Raising IT/Projects/15 AdWords/Python_R_test')
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/TomWagstaff/Documents/Raising IT/Projects/15 AdWords/Python_R_test/run_max.py", line 31, in <module>
x = subprocess.check_output(cmd, universal_newlines = True)
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\site-packages\spyder\utils\site\sitecustomize.py", line 210, in __init__
super(SubprocessPopen, self).__init__(*args, **kwargs)
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
check that you have a right path for command and script
print(os.path.exists(command))
print(os.path.exists(path2script))
note that writing path with backslashes may be dangerous as you can create escape sequence that way which will be interpreted in different way. You can write windows paths with forward slashes and then call os.path.normpath on them, turning them into safe form
(also in command you can use forward slashes only, Python interpret doesn't really care. In path to your R script that would be probably problem though)

Exception disappears if I stop at a breakpoint

Python 3.6.2
The problem with the code below is that being run, it raises an exception.
But being stepped in debugger, it works perfectly. Where I stop in the debugger is marked as breakpoint in the comments.
I tried the command both in IDE and in the shell. Exception raises. So, this problem is not related to the IDE.
This situation shook me a bit.
I made a video of it: https://www.youtube.com/watch?v=OUcMpEzooDk
Could you give me a kick here? How can it be?
Comment on the code below (not related to the problem, but just for the most curious).
This is an utility to use with Django web framework.
Users upload files, they are put to the media directory.
Of course, Django knows where the media directory is sutuated.
And then Django keeps in the database paths relative to media. Something like this:
it_1/705fad82-2f68-4f3c-90c2-116da3ad9a40.txt'
e5474da0-0fd3-4fa4-a85f-15c767ac32d4.djvu
I want to know exactly that files kept in media correspond to paths in the database. No extra files, no shortage.
Code:
from pathlib import Path
class <Something>():
def _reveal_lack_extra_files(self):
path = os.path.join(settings.BASE_DIR, '../media/')
image_files = Image.objects.values_list("file", flat=True)
image_files = [Path(os.path.join(path, file)) for file in image_files]
item_files = ItemFile.objects.values_list("file", flat=True)
item_files = [Path(os.path.join(path, file)) for file in item_files]
sheet_files = SheetFile.objects.values_list("file", flat=True)
sheet_files = [Path(os.path.join(path, file)) for file in sheet_files]
expected_files = set().union(image_files, item_files, sheet_files)
real_files = set()
glob_generator = list(Path(path).glob("**/*"))
for posix_path in glob_generator:
if os.path.isfile(posix_path._str): # Breakpoint
real_files.add(posix_path)
lack = expected_files.difference(real_files)
extra = real_files.difference(expected_files)
assert bool(lack) == False, "Lack of files: {}".format(lack)
assert bool(extra) == False, "Extra files: {}".format(extra)
Traceback:
/home/michael/PycharmProjects/venv/photoarchive_4/bin/python /home/michael/Documents/pycharm-community-2017.1.5/helpers/pydev/pydevd.py --multiproc --qt-support --client 127.0.0.1 --port 43849 --file /home/michael/PycharmProjects/photoarchive_4/manage.py checkfiles
warning: Debugger speedups using cython not found. Run '"/home/michael/PycharmProjects/venv/photoarchive_4/bin/python" "/home/michael/Documents/pycharm-community-2017.1.5/helpers/pydev/setup_cython.py" build_ext --inplace' to build.
pydev debugger: process 3840 is connecting
Connected to pydev debugger (build 171.4694.67)
Traceback (most recent call last):
File "/home/michael/Documents/pycharm-community-2017.1.5/helpers/pydev/pydevd.py", line 1591, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/michael/Documents/pycharm-community-2017.1.5/helpers/pydev/pydevd.py", line 1018, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/michael/Documents/pycharm-community-2017.1.5/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/michael/PycharmProjects/photoarchive_4/manage.py", line 22, in <module>
execute_from_command_line(sys.argv)
File "/home/michael/PycharmProjects/venv/photoarchive_4/lib/python3.6/site-packages/django/core/management/__init__.py", line 364, in execute_from_command_line
utility.execute()
File "/home/michael/PycharmProjects/venv/photoarchive_4/lib/python3.6/site-packages/django/core/management/__init__.py", line 356, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/michael/PycharmProjects/venv/photoarchive_4/lib/python3.6/site-packages/django/core/management/base.py", line 283, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/michael/PycharmProjects/venv/photoarchive_4/lib/python3.6/site-packages/django/core/management/base.py", line 330, in execute
output = self.handle(*args, **options)
File "/home/michael/PycharmProjects/photoarchive_4/general/management/commands/checkfiles.py", line 59, in handle
self._reveal_lack_extra_files()
File "/home/michael/PycharmProjects/photoarchive_4/general/management/commands/checkfiles.py", line 39, in _reveal_lack_extra_files
if os.path.isfile(posix_path._str):
AttributeError: _str
Process finished with exit code 1
You're using the _str attribute on paths, which is undocumented and not guaranteed to be set. In general, an underscore prefix indicates that this is a private attribute that should not be used by user code. If you want to convert a path to a string, just use str(the_path) instead.
But in this case, you don't need to do so: Path objects have an is_file method which you can call instead. Another possibility is to pass the Path object itself to the os.path.isfile function, which is supported on Python 3.6.

Is there a reliable way to get the path of the caller module from a Python function that is executed within a Sphinx conf.py?

I'm running some custom Python code in Sphinx and need to get the path to the caller module. (Essentially this is the caller's __file__ object; I need to interpret a filename relative to this location.)
I can get the filename from inspect.stack() as per How to use inspect to get the caller's info from callee in Python?, but apparently I need to interpret this filename in the context of the Python startup directory. (Sometimes inspect.stack()[k][1] is an absolute path but sometimes it is a relative path like conf.py; the inspect.stack() function doesn't seem to document this but unutbu claims in a comment that it is relative to the Python startup directory. )
Sphinx does some unintentionally evil things like this comment:
# This file is execfile()d with the current directory set to its
# containing dir.
so os.path.abspath(filename) doesn't work, and
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath('extensions'))
so sys.path[0] is corrupted by the time my code gets to it.
How do I find the startup directory in Python, if sys.path has been modified?
Or is there another way to get the path to the caller module?
If I run Jean-François Fabre's answer
for file,line,w1,w2 in traceback.extract_stack():
sys.stdout.write(' File "{}", line {}, in {}\n'.format(file,line,w1))
I get this:
File "c:\app\python\anaconda\1.6.0\Scripts\sphinx-build-script.py", line 5, in <module>
File "c:\app\python\anaconda\1.6.0\lib\site-packages\Sphinx-1.4.1-py2.7.egg\sphinx\__init__.py", line 51, in main
File "c:\app\python\anaconda\1.6.0\lib\site-packages\Sphinx-1.4.1-py2.7.egg\sphinx\__init__.py", line 92, in build_main
File "c:\app\python\anaconda\1.6.0\lib\site-packages\Sphinx-1.4.1-py2.7.egg\sphinx\cmdline.py", line 243, in main
File "c:\app\python\anaconda\1.6.0\lib\site-packages\Sphinx-1.4.1-py2.7.egg\sphinx\application.py", line 155, in __init__
File "conf.py", line 512, in setup
[more lines elided, the conf.py is the one that matters]
so the problem is that I need to find the path to conf.py but the current directory has been changed by Sphinx so I can't just do os.path.abspath(caller_filename)
you can get what you want using the traceback module. I've written this sample code in PyScripter:
import traceback,sys
def demo():
for file,line,w1,w2 in traceback.extract_stack():
sys.stdout.write(' File "{}", line {}, in {}\n'.format(file,line,w1))
def foo():
demo()
foo()
which gives on my Windows PC running PyScripter:
File "C:\Users\dartypc\AppData\Roaming\PyScripter\remserver.py", line 63, in <module>
File "C:\Users\dartypc\AppData\Roaming\PyScripter\remserver.py", line 60, in main
File "C:\Program Files\PyScripter\Lib\rpyc.zip\rpyc\utils\server.py", line 227, in start
File "C:\Program Files\PyScripter\Lib\rpyc.zip\rpyc\utils\server.py", line 139, in accept
File "C:\Users\dartypc\AppData\Roaming\PyScripter\remserver.py", line 14, in _accept_method
File "C:\Program Files\PyScripter\Lib\rpyc.zip\rpyc\utils\server.py", line 191, in _serve_client
File "C:\Program Files\PyScripter\Lib\rpyc.zip\rpyc\core\protocol.py", line 391, in serve_all
File "C:\Program Files\PyScripter\Lib\rpyc.zip\rpyc\core\protocol.py", line 382, in serve
File "C:\Program Files\PyScripter\Lib\rpyc.zip\rpyc\core\protocol.py", line 350, in _dispatch
File "C:\Program Files\PyScripter\Lib\rpyc.zip\rpyc\core\protocol.py", line 298, in _dispatch_request
File "C:\Program Files\PyScripter\Lib\rpyc.zip\rpyc\core\protocol.py", line 528, in _handle_call
File "<string>", line 420, in run_nodebug
File "C:\DATA\jff\data\python\stackoverflow\simple_traceback.py", line 10, in <module>
File "C:\DATA\jff\data\python\stackoverflow\simple_traceback.py", line 8, in foo
File "C:\DATA\jff\data\python\stackoverflow\simple_traceback.py", line 4, in demo
Bah, I'm just going to get around the issue by allowing callers to pass in their __file__ value :-(
my function:
def do_something(app, filename, relroot=None):
if relroot is None:
relroot = '.'
else:
relroot = os.path.dirname(relroot)
path = os.path.join(relroot, filename)
...
in conf.py:
def setup(app):
mymodule.do_something(app, 'path/to/file', relroot=__file__)

Using Python and lxml to validate XML against an external DTD

I'm trying to validate an XML file against an external DTD referenced in the doctype tag. Specifically:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd">
...the rest of the document...
I'm using Python 3.3 and the lxml module. From reading http://lxml.de/validation.html#validation-at-parse-time, I've thrown this together:
enexFile = open(sys.argv[2], mode="rb") # sys.argv[2] is the path to an XML file in local storage.
enexParser = etree.XMLParser(dtd_validation=True)
enexTree = etree.parse(enexFile, enexParser)
From what I understand of validation.html, the lxml library should now take care of retrieving the DTD and performing validation. But instead, I get this:
$ ./mapwrangler.py validate notes.enex
Traceback (most recent call last):
File "./mapwrangler.py", line 27, in <module>
enexTree = etree.parse(enexFile, enexParser)
File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955)
File "parser.pxi", line 1769, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102257)
File "parser.pxi", line 1789, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:102516)
File "parser.pxi", line 1684, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:101442)
File "parser.pxi", line 1134, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:97069)
File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:91275)
File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
File "parser.pxi", line 622, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91757)
lxml.etree.XMLSyntaxError: Validation failed: no DTD found !, line 3, column 43
This surprises me, because if I turn off validation, then the document parses in just fine and I can do print(enexTree.docinfo.doctype) to get
$ ./mapwrangler.py validate notes.enex
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export3.dtd">
So it looks to me like there shouldn't be any problem finding the DTD.
Thanks for your help.
You need to add no_network=False when constructing the parser object. This option is set to True by default.
From the documentation of parser options at http://lxml.de/parsing.html#parsers:
no_network - prevent network access when looking up external documents (on by default)
For a reason I still don't know, my problem was related to where the XML catalog was located on my local file system.
In my case, I use an XML editor that has a tight integration with a component content management system (CCMS, in this case SDL Trisoft 2011 R2). When the editor connects to the CCMS, DTDs, catalog files and a bunch of other files are synced. These files end up on the local file system in:
C:\Users\[username]\AppData\Local\Trisoft\InfoShare Client\[id]\Config\DocTypes\catalog.xml
I could not get that to work. Simply COPYING the whole catalog to another location fixed things, and this works:
f = r"path/to/my/file.xml"
# set XML catatog file path
os.environ['XML_CATALOG_FILES'] = r'C:\DATA\Mydoctypes\catalog.xml'
# configure parser
parser = etree.XMLParser(dtd_validation=True, no_network=True)
# validate
try:
valid = etree.parse(f, parser=parser)
print("This file is valid against the DTD.")
except etree.XMLSyntaxError, error:
print("This file is INVALID against the DTD!")
print(error)
Obviously this is not ideal, but it works.
Could it be something to do with file permissions, or perhaps that good old "file path too long" problem in Windows? I have not tried whether a symbolic link would work.
I am using Windows 7, Python 2.7.11 and the version of lxml is (3.6.0).

Categories