Python3 Windows 7 file path handling - python

I have fetched files from windows shared drive having path as follows:
\\piyush123\piyushtech$\Piyush\ProFileTesting\May\Input_File\OMF\futurefilesomf.egus.xls
I want to fetch filename from this path which is futurefilesomf.egus.xls
when I tried as file_path.split('\') . It's giving error as SyntaxError: EOL while scanning string literal
I can't do file_path.split('\\') because then it will give me None.
Even if I do file_path.replace('\\','\'), still same error.
What could be the solution.

Marked as 3.x so I'll assume you have 3.4+ available for Pathlib
import pathlib
path = r"\\piyush123\piyushtech$\Piyush\ProFileTesting\May\Input_File\OMF\futurefilesomf.egus.xls"
print(pathlib.Path(path).name)
print(pathlib.Path(path).name == "futurefilesomf.egus.xls")

Use basename instead of splitting:
>>> s = r"\\piyush123\piyushtech$\Piyush\ProFileTesting\May\Input_File\OMF\futurefilesomf.egus.xls"
>>> import os
>>> os.path.basename(s)
'futurefilesomf.egus.xls'

You can use ntpath:
full_path = r'\\piyush123\piyushtech$\Piyush\ProFileTesting\May\Input_File\OMF\futurefilesomf.egus.xls'
import ntpath
ntpath.split(full_path)
which gives:
('\\\\piyush123\\piyushtech$\\Piyush\\ProFileTesting\\May\\Input_File\\OMF', 'futurefilesomf.egus.xls')

You can do file_path.split('\\'). Do it like this:
>>> file_path=r"\\piyush123\piyushtech$\Piyush\ProFileTesting\May\Input_File\OMF\futurefilesomf.egus.xls"
>>> file_path.split('\\')
['', '', 'piyush123', 'piyushtech$', 'Piyush', 'ProFileTesting', 'May', 'Input_File', 'OMF', 'futurefilesomf.egus.xls']
Though you problably really need to combine it with a function from the os.path family, for example:
>>> os.path.splitunc(file_path)
('\\\\piyush123\\piyushtech$', '\\Piyush\\ProFileTesting\\May\\Input_File\\OMF\\futurefilesomf.egus.xls')

Related

How to determine if a module name is part of python standard library

I have a module name as a string (e.g. 'logging') that was given by querying the module attribute of an object.
How can I differentiate between modules that are part of my project and modules that are part of python standard library?
I know that I can check if this module was installed by pip using pip.get_installed_distributions(), but these are not related to the standard library
Note: I'm working on python 2.7 so solutions that are valid only in python 3.x are less relevant.
Unlike the answer here, I was looking for a solution that can be run in O(1) and will not require holding an array of results nor having to scan the directory for every query.
Thanks.
Quick 'n dirty solution, using the standard module imp:
import imp
import os.path
import sys
python_path = os.path.dirname(sys.executable)
my_mod_name = 'logging'
module_path = imp.find_module(my_mod_name)[1]
if 'site-packages' in module_path or python_path in module_path or not imp.is_builtin(my_mod_name):
print('module', my_mod_name, 'is not included in standard python library')
EDIT:
I used the solution which is here.
import distutils.sysconfig as sysconfig
import os
def std_modules():
ret_list = []
std_lib = sysconfig.get_python_lib(standard_lib=True)
for top, dirs, files in os.walk(std_lib):
for nm in files:
if nm != '__init__.py' and nm[-3:] == '.py':
ret_list.append(os.path.join(top, nm)[len(std_lib)+1:-3].replace('\\','.'))
return ret_list
l = std_modules()
print("logging" in l)
print("os" in l)
Output:
False
True
This works in both Python 2 and Python 3.
BEFORE EDIT:
I guess, you can use Python Docs. Here are standard library parts of Python 2 Docs and Python 3 Docs. Also, you can select the exact version of Python.
None of the solutions above were what I wanted, so I did it yet another way. Posting here in case it's useful to anyone.
import os
def standard_lib_names_gen(include_underscored=False):
standard_lib_dir = os.path.dirname(os.__file__)
for filename in os.listdir(standard_lib_dir):
if not include_underscored and filename.startswith('_'):
continue
filepath = os.path.join(standard_lib_dir, filename)
name, ext = os.path.splitext(filename)
if filename.endswith('.py') and os.path.isfile(filepath):
if str.isidentifier(name):
yield name
elif os.path.isdir(filepath) and '__init__.py' in os.listdir(filepath):
yield name
>>> standard_lib_names = set(standard_lib_names_gen(include_underscored=True))
>>> # verify that a few known libs are there (including three folders and three py files)
>>> assert {'collections', 'asyncio', 'os', 'dis', '__future__'}.issubset(standard_lib_names)
>>> # verify that other decoys are not in there
>>> assert {'__pycache__', 'LICENSE.txt', 'config-3.8-darwin', '.DS_Store'}.isdisjoint(standard_lib_names)
>>>
>>> len(standard_lib_names)
200
>>>
>>> # not including underscored
>>> standard_lib_names = set(standard_lib_names_gen(include_underscored=False))
>>> len(standard_lib_names)
184
There is also a good answer in an old duplicate of this post: https://stackoverflow.com/a/28873415/7262247
This is how you can do it (you need to pip install stdlib_list first):
from stdlib_list import stdlib_list
import sys
all_stdlib_symbols = stdlib_list('.'.join([str(v) for v in sys.version_info[0:2]]))
module_name = 'collections'
if module_name in all_stdlib_symbols:
print("%s is in stdlib" % module_name)

Syntax error when trying to run a Python file

File ex.py:
print('hello,word')
Trial run:
>>> d:\learning\python\ex.py
SyntaxError: invalid syntax
On the command line, outside the Python console, type:
python D:\learning\python\ex.py
You need to import the module:
>>> import ex
See the documentation about modules
The module has to be on your path or in the current working directory. You can change directories using chdir:
>>> import os
>>> os.chdir(r'd:\learning\python')
>>> import ex
hello,word
Or, in the command prompt, you can set PYTHONPATH=d:\learning\python, for example, before you start python, and d:\learning\python will get added to sys.path.
Instead, you could add it programatically:
>>> import sys
>>> sys.path.insert(0, r'd:\learning\python')
>>> import ex
hello,word

Python finding stdin filepath on Linux

How can I tell the file (or tty) that is attached to my stdios?
Something like:
>>> import sys
>>> print sys.stdin.__path__
'/dev/tty1'
>>>
I could look in proc:
import os, sys
os.readlink('/proc/self/fd/%s' % sys.stdin.fileno())
But seems like there should be a builtin way?
The sys.std* objects are standard Python file objects, so they have a name attribute and a isatty method:
>>> import sys
>>> sys.stdout.name
'<stdout>'
>>> sys.stdout.isatty()
True
>>> anotherfile = open('/etc/hosts', 'r')
>>> anotherfile.name
'/etc/hosts'
>>> anotherfile.isatty()
False
Short of telling you exactly what TTY device you got, that's the extend of the API offered by Python.
Got it!
>>> import os
>>> import sys
>>> print os.ttyname(sys.stdin.fileno())
'/dev/pts/0'
>>>
It raises OSError: [Errno 22] Invalid argument if stdin isn't a TTY; but thats easy enough to test for with isatty()

Is there a standard way to list names of Python modules in a package?

Is there a straightforward way to list the names of all modules in a package, without using __all__?
For example, given this package:
/testpkg
/testpkg/__init__.py
/testpkg/modulea.py
/testpkg/moduleb.py
I'm wondering if there is a standard or built-in way to do something like this:
>>> package_contents("testpkg")
['modulea', 'moduleb']
The manual approach would be to iterate through the module search paths in order to find the package's directory. One could then list all the files in that directory, filter out the uniquely-named py/pyc/pyo files, strip the extensions, and return that list. But this seems like a fair amount of work for something the module import mechanism is already doing internally. Is that functionality exposed anywhere?
Using python2.3 and above, you could also use the pkgutil module:
>>> import pkgutil
>>> [name for _, name, _ in pkgutil.iter_modules(['testpkg'])]
['modulea', 'moduleb']
EDIT: Note that the parameter for pkgutil.iter_modules is not a list of modules, but a list of paths, so you might want to do something like this:
>>> import os.path, pkgutil
>>> import testpkg
>>> pkgpath = os.path.dirname(testpkg.__file__)
>>> print([name for _, name, _ in pkgutil.iter_modules([pkgpath])])
import module
help(module)
Maybe this will do what you're looking for?
import imp
import os
MODULE_EXTENSIONS = ('.py', '.pyc', '.pyo')
def package_contents(package_name):
file, pathname, description = imp.find_module(package_name)
if file:
raise ImportError('Not a package: %r', package_name)
# Use a set because some may be both source and compiled.
return set([os.path.splitext(module)[0]
for module in os.listdir(pathname)
if module.endswith(MODULE_EXTENSIONS)])
Don't know if I'm overlooking something, or if the answers are just out-dated but;
As stated by user815423426 this only works for live objects and the listed modules are only modules that were imported before.
Listing modules in a package seems really easy using inspect:
>>> import inspect, testpkg
>>> inspect.getmembers(testpkg, inspect.ismodule)
['modulea', 'moduleb']
This is a recursive version that works with python 3.6 and above:
import importlib.util
from pathlib import Path
import os
MODULE_EXTENSIONS = '.py'
def package_contents(package_name):
spec = importlib.util.find_spec(package_name)
if spec is None:
return set()
pathname = Path(spec.origin).parent
ret = set()
with os.scandir(pathname) as entries:
for entry in entries:
if entry.name.startswith('__'):
continue
current = '.'.join((package_name, entry.name.partition('.')[0]))
if entry.is_file():
if entry.name.endswith(MODULE_EXTENSIONS):
ret.add(current)
elif entry.is_dir():
ret.add(current)
ret |= package_contents(current)
return ret
There is a __loader__ variable inside each package instance. So, if you import the package, you can find the "module resources" inside the package:
import testpkg # change this by your package name
for mod in testpkg.__loader__.get_resource_reader().contents():
print(mod)
You can of course improve the loop to find the "module" name:
import testpkg
from pathlib import Path
for mod in testpkg.__loader__.get_resource_reader().contents():
# You can filter the name like
# Path(l).suffix not in (".py", ".pyc")
print(Path(mod).stem)
Inside the package, you can find your modules by directly using __loader__ of course.
This should list the modules:
help("modules")
If you would like to view an inforamtion about your package outside of the python code (from a command prompt) you can use pydoc for it.
# get a full list of packages that you have installed on you machine
$ python -m pydoc modules
# get information about a specific package
$ python -m pydoc <your package>
You will have the same result as pydoc but inside of interpreter using help
>>> import <my package>
>>> help(<my package>)
Based on cdleary's example, here's a recursive version listing path for all submodules:
import imp, os
def iter_submodules(package):
file, pathname, description = imp.find_module(package)
for dirpath, _, filenames in os.walk(pathname):
for filename in filenames:
if os.path.splitext(filename)[1] == ".py":
yield os.path.join(dirpath, filename)
The other answers here will run the code in the package as they inspect it. If you don't want that, you can grep the files like this answer
def _get_class_names(file_name: str) -> List[str]:
"""Get the python class name defined in a file without running code
file_name: the name of the file to search for class definitions in
return: all the classes defined in that python file, empty list if no matches"""
defined_class_names = []
# search the file for class definitions
with open(file_name, "r") as file:
for line in file:
# regular expression for class defined in the file
# searches for text that starts with "class" and ends with ( or :,
# whichever comes first
match = re.search("^class(.+?)(\(|:)", line) # noqa
if match:
# add the cleaned match to the list if there is one
defined_class_name = match.group(1).strip()
defined_class_names.append(defined_class_name)
return defined_class_names
To complete #Metal3d answer, yes you can do testpkg.__loader__.get_resource_reader().contents() to list the "module resources" but it will work only if you imported your package in the "normal" way and your loader is _frozen_importlib_external.SourceFileLoader object.
But if you imported your library with zipimport (ex: to load your package in memory), your loader will be a zipimporter object, and its get_resource_reader function is different from importlib; it will require a "fullname" argument.
To make it work in these two loaders, just specify your package name in argument to get_resource_reader :
# An example with CrackMapExec tool
import importlib
import cme.protocols as cme_protocols
class ProtocolLoader:
def get_protocols(self):
protocols = {}
protocols_names = [x for x in cme_protocols.__loader__.get_resource_reader("cme.protocols").contents()]
for prot_name in protocols_names:
prot = importlib.import_module(f"cme.protocols.{prot_name}")
protocols[prot_name] = prot
return protocols
def package_contents(package_name):
package = __import__(package_name)
return [module_name for module_name in dir(package) if not module_name.startswith("__")]

How to get/set logical directory path in python

In python is it possible to get or set a logical directory (as opposed to an absolute one).
For example if I have:
/real/path/to/dir
and I have
/linked/path/to/dir
linked to the same directory.
using os.getcwd and os.chdir will always use the absolute path
>>> import os
>>> os.chdir('/linked/path/to/dir')
>>> print os.getcwd()
/real/path/to/dir
The only way I have found to get around this at all is to launch 'pwd' in another process and read the output. However, this only works until you call os.chdir for the first time.
The underlying operational system / shell reports real paths to python.
So, there really is no way around it, since os.getcwd() is a wrapped call to C Library getcwd() function.
There are some workarounds in the spirit of the one that you already know which is launching pwd.
Another one would involve using os.environ['PWD']. If that environmnent variable is set you can make some getcwd function that respects it.
The solution below combines both:
import os
from subprocess import Popen, PIPE
class CwdKeeper(object):
def __init__(self):
self._cwd = os.environ.get("PWD")
if self._cwd is None: # no environment. fall back to calling pwd on shell
self._cwd = Popen('pwd', stdout=PIPE).communicate()[0].strip()
self._os_getcwd = os.getcwd
self._os_chdir = os.chdir
def chdir(self, path):
if not self._cwd:
return self._os_chdir(path)
p = os.path.normpath(os.path.join(self._cwd, path))
result = self._os_chdir(p)
self._cwd = p
os.environ["PWD"] = p
return result
def getcwd(self):
if not self._cwd:
return self._os_getcwd()
return self._cwd
cwd = CwdKeeper()
print cwd.getcwd()
# use only cwd.chdir and cwd.getcwd from now on.
# monkeypatch os if you want:
os.chdir = cwd.chdir
os.getcwd = cwd.getcwd
# now you can use os.chdir and os.getcwd as normal.
This also does the trick for me:
import os
os.popen('pwd').read().strip('\n')
Here is a demonstration in python shell:
>>> import os
>>> os.popen('pwd').read()
'/home/projteam/staging/site/proj\n'
>>> os.popen('pwd').read().strip('\n')
'/home/projteam/staging/site/proj'
>>> # Also works if PWD env var is set
>>> os.getenv('PWD')
'/home/projteam/staging/site/proj'
>>> # This gets actual path, not symlinked path
>>> import subprocess
>>> p = subprocess.Popen('pwd', stdout=subprocess.PIPE)
>>> p.communicate()[0] # returns non-symlink path
'/home/projteam/staging/deploys/20150114-141114/site/proj\n'
Getting the environment variable PWD didn't always work for me so I use the popen method. Cheers!

Categories