get description of an installed package without actual importing it

get description of an installed package without actual importing it - python

If you type this:
import somemodule
help(somemodule)
it will print out paged package description. I would need to get the same description as a string but without importing this package to the current namespace. Is this possible? It surely is, because anything is possible in Python, but what is the most elegant/pythonic way of doing so?
Side note: by elegant way I mean without opening a separate process and capturing its stdout... ;)
In other words, is there a way to peek into a unimported but installed package and get its description? Maybe something with importlib.abc.InspectLoader? But I have no idea how to make it work the way I need.
UPDATE: I need not just not polluting the namespace but also do this without leaving any traces of itself or dependent modules in memory and in sys.modules etc. Like it was never really imported.
UPDATE: Before anyone asks me why I need it - I want to list all installed python packages with their description. But after this I do not want to have them imported in sys.modules nor occupying excessive space in memory because there can be a lots of them.

The reason that you will need to import the module to get a help string is that in many cases, the help strings are actually generated in code. It would be pointlessly difficult to parse the text of such a package to get the string since you would then have to write a small Python interpreter to reconstruct the actual string.
That being said, there are ways of completely deleting a temporarily imported modules based on this answer, which summarizes a thread that appeared on the Python mailing list around 2003: http://web.archive.org/web/20080926094551/http://mail.python.org/pipermail/python-list/2003-December/241654.html. The methods described here will generally only work if the module is not referenced elsewhere. Otherwise the module will be unloaded in the sense that import will reload it from scratch instead of using the existing sys.modules entry, but the module will still live in memory.
Here is a function that does approximately what you want and even prints a warning if the module does not appear to have been unloaded. Unlike the solutions proposed in the linked answer, this function really handles all the side-effects of loading a module, including the fact that importing one package may import other external packages into sys.modules:
import sys, warnings
def get_help(module_name):
modules_copy = sys.modules.copy()
module = __import__(module_name)
h = help(module)
for modname in list(sys.modules):
if modname not in modules_copy:
del sys[modname]
if sys.getrefcount(module) > 1:
warnings.warn('Module {} is likely not to be completely wiped'.format(module_name))
del module
return h
The reason that I make a list of the keys in the final loop is that it is inadvisable to modify a dictionary (or any other iterable) as you iterate through it. At least in Python 3, dict.keys() returns an iterable that is backed by the dictionary itself, not a frozen copy. I am not sure if h = ... and return h are even necessary, but in the worst case, h is just None.

Well, if you are only worried about keeping the global namespace tidy, you could always import in a function:
>>> def get_help():
... import math
... help(math)
...
>>> math
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'math' is not defined

I would suggest a different approach, if i understand you correctly, you wish to read a portion of a package, without importing it (even within a function with local scope). I would suggest a method to do so would be via accessing the (python_path)/Lib/site-packages/(package_name)/ and reading the contents of the respective files as an alternative to importing the module so Python can.

Related

Defining a module from within a module [duplicate]

I'd like to dynamically create a module from a dictionary, and I'm wondering if adding an element to sys.modules is really the best way to do this. EG
context = { a: 1, b: 2 }
import types
test_context_module = types.ModuleType('TestContext', 'Module created to provide a context for tests')
test_context_module.__dict__.update(context)
import sys
sys.modules['TestContext'] = test_context_module
My immediate goal in this regard is to be able to provide a context for timing test execution:
import timeit
timeit.Timer('a + b', 'from TestContext import *')
It seems that there are other ways to do this, since the Timer constructor takes objects as well as strings. I'm still interested in learning how to do this though, since a) it has other potential applications; and b) I'm not sure exactly how to use objects with the Timer constructor; doing so may prove to be less appropriate than this approach in some circumstances.
EDITS/REVELATIONS/PHOOEYS/EUREKA:
I've realized that the example code relating to running timing tests won't actually work, because import * only works at the module level, and the context in which that statement is executed is that of a function in the testit module. In other words, the globals dictionary used when executing that code is that of __main__, since that's where I was when I wrote the code in the interactive shell. So that rationale for figuring this out is a bit botched, but it's still a valid question.
I've discovered that the code run in the first set of examples has the undesirable effect that the namespace in which the newly created module's code executes is that of the module in which it was declared, not its own module. This is like way weird, and could lead to all sorts of unexpected rattlesnakeic sketchiness. So I'm pretty sure that this is not how this sort of thing is meant to be done, if it is in fact something that the Guido doth shine upon.
The similar-but-subtly-different case of dynamically loading a module from a file that is not in python's include path is quite easily accomplished using imp.load_source('NewModuleName', 'path/to/module/module_to_load.py'). This does load the module into sys.modules. However this doesn't really answer my question, because really, what if you're running python on an embedded platform with no filesystem?
I'm battling a considerable case of information overload at the moment, so I could be mistaken, but there doesn't seem to be anything in the imp module that's capable of this.
But the question, essentially, at this point is how to set the global (ie module) context for an object. Maybe I should ask that more specifically? And at a larger scope, how to get Python to do this while shoehorning objects into a given module?

Hmm, well one thing I can tell you is that the timeit function actually executes its code using the module's global variables. So in your example, you could write
import timeit
timeit.a = 1
timeit.b = 2
timeit.Timer('a + b').timeit()
and it would work. But that doesn't address your more general problem of defining a module dynamically.
Regarding the module definition problem, it's definitely possible and I think you've stumbled on to pretty much the best way to do it. For reference, the gist of what goes on when Python imports a module is basically the following:
module = imp.new_module(name)
execfile(file, module.__dict__)
That's kind of the same thing you do, except that you load the contents of the module from an existing dictionary instead of a file. (I don't know of any difference between types.ModuleType and imp.new_module other than the docstring, so you can probably use them interchangeably) What you're doing is somewhat akin to writing your own importer, and when you do that, you can certainly expect to mess with sys.modules.
As an aside, even if your import * thing was legal within a function, you might still have problems because oddly enough, the statement you pass to the Timer doesn't seem to recognize its own local variables. I invoked a bit of Python voodoo by the name of extract_context() (it's a function I wrote) to set a and b at the local scope and ran
print timeit.Timer('print locals(); a + b', 'sys.modules["__main__"].extract_context()').timeit()
Sure enough, the printout of locals() included a and b:
{'a': 1, 'b': 2, '_timer': <built-in function time>, '_it': repeat(None, 999999), '_t0': 1277378305.3572791, '_i': None}
but it still complained NameError: global name 'a' is not defined. Weird.

Partial Imports: Import everything up to a error

Is there a way to perform a python import which is not atomic?
For instance, I have a file as follows:
# Filename: a.py
myvariable = 1
mylist = [1, 2, 3]
raise ImportError
donotimportthis = 5
I then have a separate file which does the following:
import a
a.myvariable == 1 # This is okay as it imported it
a.donotimportthis # <-- raise an exception as this is not imported.
I have a file which contains some python code, this follows the format of:
...variables...
import X
I do not have X installed nor do I want it however I do want the variables.
Note: This file is autogenerated not by me but by a tool whose version is frozen.

Two choices, in descending order of preference:
Change the autogeneration process. Instead of invoking proprietary_autogen_process, invoke custom_autogen_wrapper. This wrapper in turn first invokes the proprietary third-party tool, and then modifies the produced module source code by searching for the code that imports module X, and deletes everything after it.
This is relatively straightforward. You just need to take some care to not introduce false positives or false negatives by performing too loose (or too strict) matching of the import code. Ideally you’d use an AST rewriter but that’s probably overkill; a regular expression search for import X might work, although it will yield wrong results if this text appears inside a comment, string literal or inside a method which isn’t executed.
Generate a stub module X in a location where it will be found by the autogenerated module when importing the latter. I don’t recommend this because it’s tedious: You probably can’t just generate an empty module, since the autogenerated module will want to use X. You need to generate meaningful method stubs.

You can do specific imports with
from a import myvariable
EDIT: The above won't work if anything that is flat in the file raises an error. If you have no way to edit the imported file then I don't know if there is a (resonable) solution to this. Sorry didn't realise.
(an unreasonable solution would be to read in the file as text, slice it, and then run eval on it).
Or, as mentioned in the comments, put the stuff you don't want under
if __name__=="__main__":
<here>
Then it will only be invoked if you run the file directly.

What you can do is removing the donotimportthis variable at the end of the module, as follows: del donotimportthis. I hope it helps

Does importing specific class from file instead of full file matters?

Most of the tutorials and books about Django or Flask import specific classes from files instead of importing the whole file.
For example, importing DataRequiered validator from wrtforms.validators is done via from wtforms import validators instead of importing it via import wtforms.validators as valids and then accessing DataRequiered with valids.DataRequiered.
My question is: Is there an reason for this ?
I thought to something like avoiding the loading a whole module for computation/memory optimization (is it really relevant?) ? Or is it simply to make the code more readable ?

My question is: Is there an reason for this ?
from module_or_package import something is the canonical pythonic idiom (when you only want to import something in your current namespace of course).
Also, import module_or_package.something only works if module_or_package is a package and something a submodule, it raises an ImportError(No module named something) if something is a function, class or whatever object defined in module_or_package, as can be seen in the stdlib with os.path (which is a submodule of the os.package) vs datetime.date (which is a class defined in the datetime module):
>>> import os.path as p
>>> p
<module 'posixpath' from '/home/bruno/.virtualenvs/blook/lib/python2.7/posixpath.pyc'>
vs
>>>import datetime.date as d
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named date
thought to something like avoiding the loading a whole module for computation/memory optimization (is it really relevant?)
Totally irrelevant - importing a given name from a module requires importing the whole module. Actually, this:
from module_or_package import something
is only syntactic sugar for
import module_or_package
something = module_or_package.something
del module_or_package
EDIT: You mention in a comment that
Right, but importing the whole module means loading it to the memory, which can be a reason for importing only a submodule/class
so it seems I failed to make the point clear: in Python, you can not "import only a submodule/class", period.
In Python, import, class and def are all executable statements (and actually just syntactic sugar for operation you can do 'manually' with functions and classes). Importing a module actually consists in executing all the code at the module's top-level (which will instanciate function and class objects) and create a module object (instance of module type) which attributes will be all names defined at the top-level via import, def and class statements or via explicit assignment. It's only when all this has been done that you can access any name defined in the module, and this is why, as I stated above,
from module import obj
is only syntactic sugar for
import module
obj = module.obj
del module
But (unless you do something stupid like defining a terabyte-huge dict or list in your module) this doesn't actually take that much time nor eat much ram, and a module is only effectively executed once per process the first time it's imported - then it's cached in sys.modules so subsequent imports only fetch it from cache.
Also, unless you actively prevents it, Python will cache the compiled version of the module (the .pyc files) and only recompile it if the .pyc is missing or older than the source .py file.
wrt/ packages and submodules, importing a submodule will also execute the package's __init__.py and build a module instance from it (IOW, at runtime, a package is also a module). Package initializer are canonically rather short, and actually quite often empty FWIW...

It depends, in the tutorial that was probably done for readability
Usually if you use most of the classes in a file, you import the file. If the files contains many classes but you only need a few, just import those.
It's both a matter of readability and optimization.

python: is there a disadvantage to from package import * besides namespace collision

I'm creating a class to extend a package, and prior to class instantiation I don't know which subset of the package's namespace I need. I've been careful about avoiding namespace conflicts in my code, so, does
from package import *
create problems besides name conflicts?
Is it better to examine the class's input and import only the names I need (at runtime) in the __init__ ??
Can python import from a set [] ?
does
for name in [namespace,namespace]:
from package import name
make any sense?
I hope this question doesn't seem like unnecessary hand-ringing, i'm just super new to python and don't want to do the one thing every 'beginnger's guide' says not to do (from pkg import * ) unless I'm sure there's no alternative.
thoughts, advice welcome.

In order:
It does not create other problems - however, name conflicts can be much more of a problem than you'd expect.
Definitely defer your imports if you can. Even though Python variable scoping is simplistic, you also gain the benefit of not having to import the module if the functionality that needs it never gets called.
I don't know what you mean. Square brackets are used to make lists, not sets. You can import multiple names from a module in one line - just use a comma-delimited list:
from awesome_module import spam, ham, eggs, baked_beans
# awesome_module defines lots of other names, but they aren't pulled in.
No, that won't do what you want - name is an identifier, and as such, each time through the loop the code will attempt to import the name name, and not the name that corresponds to the string referred to by the name variable.
However, you can get this kind of "dynamic import" effect, using the __import__ function. Consult the documentation for more information, and make sure you have a real reason for using it first. We get into some pretty advanced uses of the language here pretty quickly, and it usually isn't as necessary as it first appears. Don't get too clever. We hates them tricksy hobbitses.

When importing * you get everything in the module dumped straight into your namespace. This is not always a good thing as you could accentually overwrite something like;
from time import *
sleep = None
This would render the time.sleep function useless...
The other way of taking functions, variables and classes from a module would be saying
from time import sleep
This is a nicer way but often the best way is to just import the module and reference the module directly like
import time
time.sleep(3)

you can import like from PIL import Image, ImageDraw
what is imported by from x import * is limited to the list __all__ in x if it exists
importing at runtime if the module name isn't know or fixed in the code must be done with __import__ but you shouldn't have to do that

This syntax constructions help you to avoid any name collision:
from package import somename as another_name
import package as another_package_name

Python, unit-testing and mocking imports

I am in a project where we are starting refactoring some massive code base. One problem that immediately sprang up is that each file imports a lot of other files. How do I in an elegant way mock this in my unit test without having to alter the actual code so I can start to write unit-tests?
As an example: The file with the functions I want to test, imports ten other files which is part of our software and not python core libs.
I want to be able to run the unit tests as separately as possible and for now I am only going to test functions that does not depend on things from the files that are being imported.
Thanks for all the answers.
I didn't really know what I wanted to do from the start but now I think I know.
Problem was that some imports was only possible when the whole application was running because of some third-party auto-magic. So I had to make some stubs for these modules in a directory which I pointed out with sys.path
Now I can import the file which contains the functions I want to write tests for in my unit-test file without complaints about missing modules.

If you want to import a module while at the same time ensuring that it doesn't import anything, you can replace the __import__ builtin function.
For example, use this class:
class ImportWrapper(object):
def __init__(self, real_import):
self.real_import = real_import
def wrapper(self, wantedModules):
def inner(moduleName, *args, **kwargs):
if moduleName in wantedModules:
print "IMPORTING MODULE", moduleName
self.real_import(*args, **kwargs)
else:
print "NOT IMPORTING MODULE", moduleName
return inner
def mock_import(self, moduleName, wantedModules):
__builtins__.__import__ = self.wrapper(wantedModules)
try:
__import__(moduleName, globals(), locals(), [], -1)
finally:
__builtins__.__import__ = self.real_import
And in your test code, instead of writing import myModule, write:
wrapper = ImportWrapper(__import__)
wrapper.mock_import('myModule', [])
The second argument to mock_import is a list of module names you do want to import in inner module.
This example can be modified further to e.g. import other module than desired instead of just not importing it, or even mocking the module object with some custom object of your own.

If you really want to muck around with the python import mechanism, take a look at the ihooks module. It provides tools for changing the behavior of the __import__ built-in. But it's not clear from your question why you need to do this.

"imports a lot of other files"? Imports a lot of other files that are part of your customized code base? Or imports a lot of other files that are part of the Python distribution? Or imports a lot of other open source project files?
If your imports don't work, you have a "simple" PYTHONPATH problem. Get all of your various project directories onto a PYTHONPATH that you can use for testing. We have a rather complex path, in Windows we manage it like this
#set Part1=c:\blah\blah\blah
#set Part2=c:\some\other\path
#set that=g:\shared\stuff
set PYTHONPATH=%part1%;%part2%;%that%
We keep each piece of the path separate so that we (a) know where things come from and (b) can manage change when we move things around.
Since the PYTHONPATH is searched in order, we can control what gets used by adjusting the order on the path.
Once you have "everything", it becomes a question of trust.
Either
you trust something (i.e., the Python code base) and just import it.
Or
You don't trust something (i.e., your own code) and you
test it separately and
mock it for stand-alone testing.
Would you test the Python libraries? If so, you've got a lot of work. If not, then, you should perhaps only mock out the things you're actually going to test.

No difficult manipulation is necessary if you want a quick-and-dirty fix before your unit-tests.
If the unit tests are in the same file as the code you wish to test, simply delete unwanted module from the globals() dictionary.
Here is a rather lengthy example: suppose you have a module impp.py with contents:
value = 5
Now, in your test file you can write:
>>> import impp
>>> print globals().keys()
>>> def printVal():
>>> print impp.value
['printVal', '__builtins__', '__file__', 'impp', '__name__', '__doc__']
Note that impp is among the globals, because it was imported. Calling the function printVal that uses impp module still works:
>>> printVal()
5
But now, if you remove impp key from globals()...
>>> del globals()['impp']
>>> print globals().keys()
['printVal', '__builtins__', '__file__', '__name__', '__doc__']
...and try to call printVal(), you'll get:
>>> printVal()
Traceback (most recent call last):
File "test_imp.py", line 13, in <module>
printVal()
File "test_imp.py", line 5, in printVal
print impp.value
NameError: global name 'impp' is not defined
...which is probably exactly what you're trying to achieve.
To use it in your unit-tests, you can delete the globals just before running the test suite, e.g. in __main__:
if __name__ == '__main__':
del globals()['impp']
unittest.main()

In your comment above, you say you want to convince python that certain modules have already been imported. This still seems like a strange goal, but if that's really what you want to do, in principle you can sneak around behind the import mechanism's back, and change sys.modules. Not sure how this'd work for package imports, but should be fine for absolute imports.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

get description of an installed package without actual importing it - python

Well, if you are only worried about keeping the global namespace tidy, you could always import in a function: >>> def get_help(): ... import math ... help(math) ... >>> math Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'math' is not defined

Related

Defining a module from within a module [duplicate]

Partial Imports: Import everything up to a error

Does importing specific class from file instead of full file matters?

python: is there a disadvantage to from package import * besides namespace collision

Python, unit-testing and mocking imports

Categories

Resources