Enforcing garbage collection of module level variables? - python

Say I load a big data file used within a module but only care about something derived from it that requires very little memory. What is the best way to structure a module so that it doesn't keep unwanted data in memory when I load it as a module.
Something like:
# getstuff.py
from importlib_resources import files
data = files('path.to.file').joinpath('big.one').read_text()
stuff_i_care_about = some_complicated_operation(data)
# __init__.py
from .getstuff import stuff_i_care_about
What is the best way to make sure that data is freed? del + gc.collect()? Wrap it in a function? Might it be freed automatically in some versions anyway?

Related

get description of an installed package without actual importing it

If you type this:
import somemodule
help(somemodule)
it will print out paged package description. I would need to get the same description as a string but without importing this package to the current namespace. Is this possible? It surely is, because anything is possible in Python, but what is the most elegant/pythonic way of doing so?
Side note: by elegant way I mean without opening a separate process and capturing its stdout... ;)
In other words, is there a way to peek into a unimported but installed package and get its description? Maybe something with importlib.abc.InspectLoader? But I have no idea how to make it work the way I need.
UPDATE: I need not just not polluting the namespace but also do this without leaving any traces of itself or dependent modules in memory and in sys.modules etc. Like it was never really imported.
UPDATE: Before anyone asks me why I need it - I want to list all installed python packages with their description. But after this I do not want to have them imported in sys.modules nor occupying excessive space in memory because there can be a lots of them.
The reason that you will need to import the module to get a help string is that in many cases, the help strings are actually generated in code. It would be pointlessly difficult to parse the text of such a package to get the string since you would then have to write a small Python interpreter to reconstruct the actual string.
That being said, there are ways of completely deleting a temporarily imported modules based on this answer, which summarizes a thread that appeared on the Python mailing list around 2003: http://web.archive.org/web/20080926094551/http://mail.python.org/pipermail/python-list/2003-December/241654.html. The methods described here will generally only work if the module is not referenced elsewhere. Otherwise the module will be unloaded in the sense that import will reload it from scratch instead of using the existing sys.modules entry, but the module will still live in memory.
Here is a function that does approximately what you want and even prints a warning if the module does not appear to have been unloaded. Unlike the solutions proposed in the linked answer, this function really handles all the side-effects of loading a module, including the fact that importing one package may import other external packages into sys.modules:
import sys, warnings
def get_help(module_name):
modules_copy = sys.modules.copy()
module = __import__(module_name)
h = help(module)
for modname in list(sys.modules):
if modname not in modules_copy:
del sys[modname]
if sys.getrefcount(module) > 1:
warnings.warn('Module {} is likely not to be completely wiped'.format(module_name))
del module
return h
The reason that I make a list of the keys in the final loop is that it is inadvisable to modify a dictionary (or any other iterable) as you iterate through it. At least in Python 3, dict.keys() returns an iterable that is backed by the dictionary itself, not a frozen copy. I am not sure if h = ... and return h are even necessary, but in the worst case, h is just None.
Well, if you are only worried about keeping the global namespace tidy, you could always import in a function:
>>> def get_help():
... import math
... help(math)
...
>>> math
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'math' is not defined
I would suggest a different approach, if i understand you correctly, you wish to read a portion of a package, without importing it (even within a function with local scope). I would suggest a method to do so would be via accessing the (python_path)/Lib/site-packages/(package_name)/ and reading the contents of the respective files as an alternative to importing the module so Python can.

Dynamically create Ctypes in Python

I have a file that I read from which has definitions of ctypes that are used in a separate project. I can read the file and obtain all the necessary information to create a ctype that I want in Python like the name, fields, bitfields, ctype base class (Structure, Union, Enum, etc), and pack.
I want to be able to create a ctype class from the information above. I also want these ctypes to be pickleable.
I currently have two solutions, both of which I feel like are hacks.
Solution 1
Generate a Python code object in an appropriate ctype format by hand or with the use of something like Jinja2 and then evaluate the python code object.
This solution has the downside of using eval. I always try to stay away from eval and I don't feel like this is a good place to use it.
Solution 2
Create a ctype dynamically in a function like so:
from ctypes import Structure
def create_ctype_class(name, base, fields, pack):
class CtypesStruct(base):
_fields_ = fields
_pack_ = pack
CtypesStruct.__name__ = name
return CtypesStruct
ctype = create_ctype_class('ctype_struct_name', ctypes.Structure,
[('field1', ctypes.c_uint8)], 4)
This solution isn't so bad, but setting the name of the class is ugly and the type cannot be pickled.
Is there a better way of creating a dynamic ctype class?
Note: I am using Python 2.7
Solution 2 is probably your better option, though if you're also writing such classes statically, you may want to use a metaclass to deduplicate some of that code. If you need your objects to be pickleable, then you'll need a way to reconstruct them from pickleable objects. Once you've implemented such a mechanism, you can make the pickle module aware of it with a __reduce__() method.
I would go with a variant of Solution 1. Instead of evaling code, create a directory with an __init__.py (i.e. a package), add it to your sys.path and write out an entire python module containing all of the classes. Then you can import them from a stable namespace which will make pickle happier.
You can either take the output and add it to your app's source code or dynamically recreate it and cache it on a target machine at runtime.
pywin32 uses an approach like this for caching classes generated from ActiveX interfaces.

Can I make a dictionary available to multiple python scripts?

I have a dictionary of addresses with their usernames and passwords listed that looks something like this:
address_dict = {'address1':{'username':'abc', 'password':'123'}, 'address2':{'username':'xyz', 'password':'456'}}
Is there a way to make this dictionary accessible for multiple scripts to read from and possibly write to? Like save it as seperate python file and import it or something?
Yes, you can do just that:
# module.py
address_dict = {'address1':{'username':'abc', 'password':'123'}, 'address2':{'username':'xyz', 'password':'456'}}
# main.py
import module
print(module.address_dict)
If you don't like the module. prefix, you could import the dictionary like so:
from module import address_dict
print(address_dict)
To access it and modify it at runtime, you can just define it in a module and then import it. But if you want your changes to be persistent (i.e. see the changed version next time you run the script) you need something else, like a database.
The simplest to use in this case would probably be the shelve module, which is based on pickle. You can also use pickle itself if you wish.
take a look at pickle :)
http://docs.python.org/2/library/pickle.html
You can use it to dump objects to files and also to read them back in with any other python script.

Recursively populating __all__ in __init__.py

I'm using the following code to populate __all__ in my module's __init__.py and I was wandering if there was a more efficient way. Any ideas?
import fnmatch
import os
__all__ = []
for root, dirnames, filenames in os.walk(os.path.dirname(__file__)):
root = root[os.path.dirname(__file__).__len__():]
for filename in fnmatch.filter(filenames, "*.py"):
__all__.append(os.path.join(root, filename[:-3]))
You probably shouldn't be doing this: The default behaviour of import is quite flexible. If you don't want a module (or any other variable) to be automatically exported, give it a name that starts with _ and python won't export it. That's the standard python way, and reinventing the wheel is considered unpythonic. Also, don't forget that other things besides modules may need exporting; once you set __all__, you'll need to find and export them as well.
Still, you ask how to best generate a list of your exportable modules. Since you can't export what's not present, I'd just check what modules of your own are known to your main module:
basedir = os.path.dirname(__file__)
for m in sys.modules:
if m in locals() and not m.startswith('_'): # Only export regular names
mod = locals()[m]
if '__file__' in mod.__dict__ and mod.__file__.startswith(basedir):
print m
sys.modules includes the names of every module that python has loaded, including many that have not been exported to your main module-- so we check if they're in locals().
This is faster than scanning your filesystem, and more robust than assuming that every .py file in your directory tree will somehow end up as a top-level submodule. Naturally you should run this code near the end of your __init__.py, when everything has been loaded.
I work with a few complex packages that have sub-packages and sub-modules. I like to control this on a module by module basis. I use a simple package called auto-all which makes it easy (full disclosure - I am the author).
https://pypi.org/project/auto-all/
Here's an example:
from auto_all import start_all, end_all
# Define some internal stuff
start_all(globals())
# Define some external stuff
end_all(globals())
The reason I use this approach is mainly because of imports. As mentioned by alexis, you can implicitly make things private by prefixing object names with an underscore, however this can get messy or just impractical for imported objects. Consider the following code:
from pyspark.sql.session import SparkSession
If this appears in your module then you will be implicitly making SparkSession available to be accessed from outside the module. The alternative is to prefix all imported items with underscores, for example:
from pyspark.sql.session import SparkSession as _SparkSession
This also isn't ideal, so manually managing __all__ is the only way (I'm aware of) to manage what you make externally available.
You can easily do this by explicitly setting the contents of the __all__ variable (which is the pythonic way), but this can become tedious when managing a large number of objects, and can also lead to issues if a developer adds a new object and doesn't expose it by adding to the __all__ variable. This type of thing can slip through code reviews. Using simple helper functions to manage the variable contents makes this much easier.

Python includes, module scope issue

I'm working on my first significant Python project and I'm having trouble with scope issues and executing code in included files. Previously my experience is with PHP.
What I would like to do is have one single file that sets up a number of configuration variables, which would then be used throughout the code. Also, I want to make certain functions and classes available globally. For example, the main file would include a single other file, and that file would load a bunch of commonly used functions (each in its own file) and a configuration file. Within those loaded files, I also want to be able to access the functions and configuration variables. What I don't want to do, is to have to put the entire routine at the beginning of each (included) file to include all of the rest. Also, these included files are in various sub-directories, which is making it much harder to import them (especially if I have to re-import in every single file).
Anyway I'm looking for general advice on the best way to structure the code to achieve what I want.
Thanks!
In python, it is a common practice to have a bunch of modules that implement various functions and then have one single module that is the point-of-access to all the functions. This is basically the facade pattern.
An example: say you're writing a package foo, which includes the bar, baz, and moo modules.
~/project/foo
~/project/foo/__init__.py
~/project/foo/bar.py
~/project/foo/baz.py
~/project/foo/moo.py
~/project/foo/config.py
What you would usually do is write __init__.py like this:
from foo.bar import func1, func2
from foo.baz import func3, constant1
from foo.moo import func1 as moofunc1
from foo.config import *
Now, when you want to use the functions you just do
import foo
foo.func1()
print foo.constant1
# assuming config defines a config1 variable
print foo.config1
If you wanted, you could arrange your code so that you only need to write
import foo
At the top of every module, and then access everything through foo (which you should probably name "globals" or something to that effect). If you don't like namespaces, you could even do
from foo import *
and have everything as global, but this is really not recommended. Remember: namespaces are one honking great idea!
This is a two-step process:
In your module globals.py import the items from wherever.
In all of your other modules, do "from globals import *"
This brings all of those names into the current module's namespace.
Now, having told you how to do this, let me suggest that you don't. First of all, you are loading up the local namespace with a bunch of "magically defined" entities. This violates precept 2 of the Zen of Python, "Explicit is better than implicit." Instead of "from foo import *", try using "import foo" and then saying "foo.some_value". If you want to use the shorter names, use "from foo import mumble, snort". Either of these methods directly exposes the actual use of the module foo.py. Using the globals.py method is just a little too magic. The primary exception to this is in an __init__.py where you are hiding some internal aspects of a package.
Globals are also semi-evil in that it can be very difficult to figure out who is modifying (or corrupting) them. If you have well-defined routines for getting/setting globals, then debugging them can be much simpler.
I know that PHP has this "everything is one, big, happy namespace" concept, but it's really just an artifact of poor language design.
As far as I know program-wide global variables/functions/classes/etc. does not exist in Python, everything is "confined" in some module (namespace). So if you want some functions or classes to be used in many parts of your code one solution is creating some modules like: "globFunCl" (defining/importing from elsewhere everything you want to be "global") and "config" (containing configuration variables) and importing those everywhere you need them. If you don't like idea of using nested namespaces you can use:
from globFunCl import *
This way you'll "hide" namespaces (making names look like "globals").
I'm not sure what you mean by not wanting to "put the entire routine at the beginning of each (included) file to include all of the rest", I'm afraid you can't really escape from this. Check out the Python Packages though, they should make it easier for you.
This depends a bit on how you want to package things up. You can either think in terms of files or modules. The latter is "more pythonic", and enables you to decide exactly which items (and they can be anything with a name: classes, functions, variables, etc.) you want to make visible.
The basic rule is that for any file or module you import, anything directly in its namespace can be accessed. So if myfile.py contains definitions def myfun(...): and class myclass(...) as well as myvar = ... then you can access them from another file by
import myfile
y = myfile.myfun(...)
x = myfile.myvar
or
from myfile import myfun, myvar, myclass
Crucially, anything at the top level of myfile is accessible, including imports. So if myfile contains from foo import bar, then myfile.bar is also available.

Categories