python import behaviour: different objects from same file? - python

consider this:
/
test.py
lib/
L __init__.py
+ x/
L __init__.py
L p.py
with p.py:
class P():
pass
p1 = P()
With test.py:
import sys
import os
sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), "lib"))
import lib.x.p
import x.p
print(id(lib.x.p.p1))
print(id(x.p.p1))
Here I get different object IDs though I am importing the same object from the same package/module Can someone please explain this behaviour, as it is very confusing, and I did not find any documentation about it.
Thanks!

Modules are cached in the dicitonary sys.modules using their dotted names as keys. Since you are importing the same module by two different dotted names, you end up with two copies of this module, and also with two copies of everything inside them.
The solution is easy: Don't do this, and try to avoid messing around with sys.path.

x.p and lib.x.p aren't the same module. They come from the same file, but Python doesn't determine a module's identity by its file; a module's identity is based on its package-qualified name. The module search logic may have found the same file for both modules, but they're still loaded and executed separately, and objects created in one module are distinct from objects created in another.

Related

PyGame move screen to another file? [duplicate]

I have a file, myfile.py, which imports Class1 from file.py and file.py contains imports to different classes in file2.py, file3.py, file4.py.
In my myfile.py, can I access these classes or do I need to again import file2.py, file3.py, etc.?
Does Python automatically add all the imports included in the file I imported, and can I use them automatically?
Best practice is to import every module that defines identifiers you need, and use those identifiers as qualified by the module's name; I recommend using from only when what you're importing is a module from within a package. The question has often been discussed on SO.
Importing a module, say moda, from many modules (say modb, modc, modd, ...) that need one or more of the identifiers moda defines, does not slow you down: moda's bytecode is loaded (and possibly build from its sources, if needed) only once, the first time moda is imported anywhere, then all other imports of the module use a fast path involving a cache (a dict mapping module names to module objects that is accessible as sys.modules in case of need... if you first import sys, of course!-).
Python doesn't automatically introduce anything into the namespace of myfile.py, but you can access everything that is in the namespaces of all the other modules.
That is to say, if in file1.py you did from file2 import SomeClass and in myfile.py you did import file1, then you can access it within myfile as file1.SomeClass. If in file1.py you did import file2 and in myfile.py you did import file1, then you can access the class from within myfile as file1.file2.SomeClass. (These aren't generally the best ways to do it, especially not the second example.)
This is easily tested.
In the myfile module, you can either do from file import ClassFromFile2 or from file2 import ClassFromFile2 to access ClassFromFile2, assuming that the class is also imported in file.
This technique is often used to simplify the API a bit. For example, a db.py module might import various things from the modules mysqldb, sqlalchemy and some other helpers. Than, everything can be accessed via the db module.
If you are using wildcard import, yes, wildcard import actually is the way of creating new aliases in your current namespace for contents of the imported module. If not, you need to use the namespace of the module you have imported as usual.

Why does __all__ work differently in packages than in modules?

The Python documentation for the import statement (link) contains the following:
The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module.
The Python documentation for modules (link) contains what is seemingly a contradictory statement:
if a package’s __init__.py code defines a list named __all__, it is taken to be the list of module names that should be imported when from package import * is encountered.
It then gives an example where an __init__.py file imports nothing, and simply defines __all__ to be some of the names of modules in that package.
I have tested both ways of using __all__, and both seem to work; indeed one can mix and match within the same __all__ value.
For example, consider the directory structure
foopkg/
__init__.py
foo.py
Where __init__.py contains
# Note no imports
def bar():
print("BAR")
__all__ = ["bar", "foo"]
NOTE: I know one shouldn't define functions in an __init__.py file. I'm just doing it to illustrate that the same __all__ can export both names that do exist in the current namespace, and those which do not.
The following code runs, seemingly auto-importing the foo module:
>>> from foopkg import *
>>> dir()
[..., 'bar', 'foo']
Why does the __all__ attribute have this strange double-behaviour?
The docs seem really unclear on how it is supposed to be used, only mentioning one of its two sides in each place I linked. I understand the overall purpose is to explicitly set the names imported by a wildcard import, but am confused by the additional, seemingly auto-importing behaviour. Is this just a magic shortcut that avoids having to write the import out as well?
The documentation is a bit hard to parse because it does not mention that packages generally also have the behavior of modules, including their __all__ attribute. The behavior of packages is necessarily a superset of the behavior of modules, because packages, unlike modules, can have sub-packages and sub-modules. Behaviors not related to that feature are identical between the two as far as the end-user is concerned.
The python docs can be minimalistic at times. They did not bother to mention that
Package __init__ performs all the module-like code for a package, including support for star-import for direct attributes via __all__, just like a module does.
Modules support all the features of a package __init__.py, except that they can't have a sub-package or sub-module.
It goes without saying that to make a name refer to a sub-module, it has to be imported, hence the apparent, but not really double-standard.
Update: How from M import * actually works?
The __all__ in __init__.py of folder foopkg works the same way as __all__ in foopkg.py
Why it'll auto-import foo you can see here: https://stackoverflow.com/a/54799108/12565014
The most import thing is to look at the cpython implementation: https://github.com/python/cpython/blob/fee552669f21ca294f57fe0df826945edc779090/Python/ceval.c#L5152
It basically loop through __all__ and try to import each element in __all__
That's why it'll auto-import foo and also achieve white listing

Integrate class attributes in client namespace

I want to define a bunch of attributes for use in a module that should also be accessible from other modules, because they're part of my interface contract.
I've put them in a data class in my module like this, but I want to avoid qualifying them every time, similar to how you use import * from a module:
#dataclass
class Schema:
key1='key1'
key2='key2'
and in the same module:
<mymodule.py>
print(my_dict[Schema.key1])
I would prefer to be able to do this:
print(my_dict[key1])
I was hoping for an equivalent syntax to:
from Schema import *
This would allow me to do this from other modules too:
<another_module.py>
from mymodule.Schema import *
but that doesn't work.
Is there a way to do this?
Short glossary
module - a python file that can be imported
package - a collection of modules in a directory that can also be imported, is technically also a module
name - shorthand for a named value (often just "variable" in other languages), they can be imported from modules
Using import statements allows you to import either packages, modules, or names:
import xml # package
from xml import etree # also a package
from xml.etree import ElementTree # module
from xml.etree.ElementTree import TreeBuilder # name
# --- here is where it ends ---
from xml.etree.ElementTree.TreeBuilder import element_factory # does not work
The dots in such an import chain can only be made after module objects, which packages and modules are, and names are not. So, while it looks like we are just accessing attributes of objects, we are actually relying on a mechanism that normal objects just don't support, so we can't import from within them.
In your particular case, a reasonable solution would be to simply turn the object that you wanted to hold the schema into a top-level module in your project:
schema.py
key1 = 'key1'
key2 = 'key2'
...
Which will give you the option to import them in the way that you initially proposed. Doing something like this to make common constants easily accessible in your project is not unusual, and the django framework for example uses a settings.py in the same manner.
One thing you should keep in mind is that names in python modules are effectively singletons, so their values can't be changed at runtime[1].
[1] They can, but it's so hacky that it should pretty much always be treated as not possible.

Recursively populating __all__ in __init__.py

I'm using the following code to populate __all__ in my module's __init__.py and I was wandering if there was a more efficient way. Any ideas?
import fnmatch
import os
__all__ = []
for root, dirnames, filenames in os.walk(os.path.dirname(__file__)):
root = root[os.path.dirname(__file__).__len__():]
for filename in fnmatch.filter(filenames, "*.py"):
__all__.append(os.path.join(root, filename[:-3]))
You probably shouldn't be doing this: The default behaviour of import is quite flexible. If you don't want a module (or any other variable) to be automatically exported, give it a name that starts with _ and python won't export it. That's the standard python way, and reinventing the wheel is considered unpythonic. Also, don't forget that other things besides modules may need exporting; once you set __all__, you'll need to find and export them as well.
Still, you ask how to best generate a list of your exportable modules. Since you can't export what's not present, I'd just check what modules of your own are known to your main module:
basedir = os.path.dirname(__file__)
for m in sys.modules:
if m in locals() and not m.startswith('_'): # Only export regular names
mod = locals()[m]
if '__file__' in mod.__dict__ and mod.__file__.startswith(basedir):
print m
sys.modules includes the names of every module that python has loaded, including many that have not been exported to your main module-- so we check if they're in locals().
This is faster than scanning your filesystem, and more robust than assuming that every .py file in your directory tree will somehow end up as a top-level submodule. Naturally you should run this code near the end of your __init__.py, when everything has been loaded.
I work with a few complex packages that have sub-packages and sub-modules. I like to control this on a module by module basis. I use a simple package called auto-all which makes it easy (full disclosure - I am the author).
https://pypi.org/project/auto-all/
Here's an example:
from auto_all import start_all, end_all
# Define some internal stuff
start_all(globals())
# Define some external stuff
end_all(globals())
The reason I use this approach is mainly because of imports. As mentioned by alexis, you can implicitly make things private by prefixing object names with an underscore, however this can get messy or just impractical for imported objects. Consider the following code:
from pyspark.sql.session import SparkSession
If this appears in your module then you will be implicitly making SparkSession available to be accessed from outside the module. The alternative is to prefix all imported items with underscores, for example:
from pyspark.sql.session import SparkSession as _SparkSession
This also isn't ideal, so manually managing __all__ is the only way (I'm aware of) to manage what you make externally available.
You can easily do this by explicitly setting the contents of the __all__ variable (which is the pythonic way), but this can become tedious when managing a large number of objects, and can also lead to issues if a developer adds a new object and doesn't expose it by adding to the __all__ variable. This type of thing can slip through code reviews. Using simple helper functions to manage the variable contents makes this much easier.

How to import modules that are used in both the main code and a module correctly?

Let's assume I have a main script, main.py, that imports another python file with import coolfunctions and another: import chores
Now, suppose coolfunctions also uses stuff from chores, hence I declare import chores inside coolfunctions.
Since both main.py, and coolfunctions import chores ~ is this redundant? Is there any other way of doing this? Am I doing it correctly?
I'm confused about how python projects should be structured in general. I have a "conf.py" file, that I import for a bunch of variables ~ is this a module or not? I load this conf file in multiple places as well.
If two modules want to use chores, then each one must import chores (or some equivalent import). Each import creates a name binding only in the namespace of the module that does the import; that is, import's namespace effect is local to a module's namespace.
This is good, because by looking at a module's code you can (barring pathological cases) know where each name is bound to by the import statements that explicitly bind modules or module attributes to names. Imports made in other modules won't affect this module's namespace.
Each module X should import all (and only) the modules Y, Z, T, ... whose functionality it requires, without any worry about what other modules Fee, Fie, Foo ... (if any) may have already done part or all of those imports, or may be going to do so in the future.
It would make a module extremely fragile (indeed, it would be the very opposite of modularity!) if each module had to worry about such subtle, "covert-channel" effects.
What other modules Y, Z, T, ..., each module X chooses to import (if any) is part of X's implementation details, and shouldn't concern anybody except the developers who are coding, testing, or maintaining X.
In order to ensure that this is the case, and that this clearly-best strategy of decoupling can and will fully be followed by sane code, Python "caches" modules as they get imported: a module is "loaded" only once per run of a program, the first time anybody imports it (or anything from inside it) -- all other imports use the same object obtained by that first loading, which Python keeps in a cache (which is specified as being the dict sys.modules, but you need to know that detail only for somewhat-advanced programming techniques... don't worry about it, 98.7% of the time -- just remember that "import is cheap"!-).
Sure, a conf.py that you use from several other modules via import conf is definitely a module (you may think you're loading it multiple times, but you aren't unless you're using pretty advanced and deliberate techniques indeed for the purpose) -- why shouldn't it be?
No, this isn't redundant - it's fine to import chores in both the main module and coolfunctions.
The exact import mechanics of Python are complex (for example, module imports are only done once, meaning in your case that the actual parsing and loading of the chores module will only happen once, which is a nice optimization) but in general you shouldn't worry about it because it just works.
Each Python file is a module, so your conf.py is also a module.
It is always the best practice to import all necessary modules in the file that uses them. Take for example:
A.py contains: import coolfunctions
B.py contains: import A
Main.py contains: import B and uses functions that are defined in A.py (this is possible because by importing B, Main.py has imported everything that B imports)
If in the future, you change B.py to function without needing to import A.py and therefore remove the import A, then your Main.py will suffer the loss of not having imported A.

Categories