Python C extension importer how to use import lock - python

Background
I'm building an importer to encrypt my code. When compiling, I encrypt the code object in pyc files. When loading, I use this customized importer to decrypt the code object before execution.
Since my code is bundled in zip format, I decided to modify zipimporter from the builtin zipimport package by adding the decrypt logic into it.
I'm using Python 3.7
Problem
I modified the zimpimport.c and make it into C extension. The encrypt & decrypt process works fine but I started to see ImportError & AttributeError. For example:
"""module: test.py"""
class Foo:
def bar():
pass
foo = Foo()
// Import Error: can not import name 'foo' from test
// Attribute Error: foo has no attribute bar
When server starts, this error only occurs during the first few minutes randomly. So I suspect that it's a multithreading problem and it's caused by a thread seeing a partially initialized module object.
What I tried
After checking the origin c code, the implementation for loading a module is:
mod = PyImport_AddModuleObject(fullname);
// ... some code
mod = PyImport_ExecCodeModuleObject(fullname, code, modpath, NULL);
It first add module to sys.modules, then execute some c code and execute the module code at last. If another thread read the sys.modules before it finishes executing module code, it can get the ImportError and AttributeError.
I just copy the original implementation and it doesn't use import lock. I guess it's handled by interpreter on outer level and I have to explicitly use import lock. So I wrap the code block with the import lock as following:
#include "import.h"
_PyImport_AcquireLock();
// wrapped code block
_PyImport_ReleaseLock();
My Question
After adding import lock, the error still happens. I'm not familiar with python c api. Have I used the lock in a correct way? Are there other possible reasons that can cause this error?
Code
To test, put the following files in same folder:
zzimporter.c lock added in func zipimport_zipimporter_load_module_impl
zzimporter.h
setup.py
Run python setup.py build_ext --inplace and it will generate zzimporter.so. And it works just like the buit-in zipimport module.

Related

Why do you run other lines of codes of a python file(say test.py) when you are just importing a piece of test.py from somewhere else (say main.py)?

I have two python files. One is main.py which I execute. The other is test.py, from which I import a class in main.py.
Here is the sample code for main.py:
from test import Test
if __name__ == '__main__':
print("You're inside main.py")
test_object = Test()
And, here is the sample code for test.py:
class Test:
def __init__(self):
print("you're initializing the class.")
if __name__ == '__main__':
print('You executed test.py')
else:
print('You executed main.py')
Finally, here's the output, when you execute main.py:
You executed main.py
You're inside main.py
you're initializing the class.
From the order of outputs above, you can see that once you import a piece of a file, the whole file gets executed immediately. I am wondering why? what's the logic behind that?
I am coming from java language, where all files included a single class with the same name. I am just confused that why python behaves this way.
Any explanation would be appricated.
What is happening?
When you import the test-module, the interpreter runs through it, executing line by line. Since the if __name__ == '__main__' evaluates as false, it executes the else-clause. After this it continues beyond the from test import Test in main.py.
Why does python execute the imported file?
Python is an interpreted language. Being interpreted means that the program being read and evaluated one line at the time. Going through the imported module, the interpreter needs to evaluate each line, as it has no way to discern which lines are useful to the module or not. For instance, a module could have variables that need to be initialized.
Python is designed to support multiple paradigms. This behavior is used in some of the paradigms python supports, such as procedural programming.
Execution allows the designer of that module to account for different use cases. The module could be imported or run as a script. To accommodate this, some functions, classes or methods may need to be redefined. As an example, a script could output non-critical errors to the terminal, while an imported module to a log-file.
Why specify what to import?
Lets say you are importing two modules, both with a Test-class. If everything from those modules is imported, only one version of the Test-class can exist in our program. We can resolve this issue using different syntax.
import package1
import package2
package1.Test()
packade2.Test()
Alternatively, you can rename them with the as-keyword.
from package1 import Test
from package2 import Test as OtherTest
Test()
OtherTest()
Dumping everything into the global namepace (i.e from test import *) pollutes the namespace of your program with a lot of definitions you might not need and unintentionally overwrite/use.
where all files included a single class with the same name
There is not such requirement imposed in python, you can put multiple classes, functions, values in single .py file for example
class OneClass:
pass
class AnotherClass:
pass
def add(x,y):
return x+y
def diff(x,y):
return x-y
pi = 22/7
is legal python file.
According to interview with python's creator modules mechanism in python was influenced by Modula-2 and Modula-3 languages. So maybe right question is why creators of said languages elected to implement modules that way?

Python NameError when importing functions from other files [duplicate]

I'm getting this error
Traceback (most recent call last):
File "/Users/alex/dev/runswift/utils/sim2014/simulator.py", line 3, in <module>
from world import World
File "/Users/alex/dev/runswift/utils/sim2014/world.py", line 2, in <module>
from entities.field import Field
File "/Users/alex/dev/runswift/utils/sim2014/entities/field.py", line 2, in <module>
from entities.goal import Goal
File "/Users/alex/dev/runswift/utils/sim2014/entities/goal.py", line 2, in <module>
from entities.post import Post
File "/Users/alex/dev/runswift/utils/sim2014/entities/post.py", line 4, in <module>
from physics import PostBody
File "/Users/alex/dev/runswift/utils/sim2014/physics.py", line 21, in <module>
from entities.post import Post
ImportError: cannot import name Post
and you can see that I use the same import statement further up and it works. Is there some unwritten rule about circular importing? How do I use the same class further down the call stack?
See also What happens when using mutual or circular (cyclic) imports in Python? for a general overview of what is allowed and what causes a problem WRT circular imports. See What can I do about "ImportError: Cannot import name X" or "AttributeError: ... (most likely due to a circular import)"? for techniques for resolving and avoiding circular dependencies.
I think the answer by jpmc26, while by no means wrong, comes down too heavily on circular imports. They can work just fine, if you set them up correctly.
The easiest way to do so is to use import my_module syntax, rather than from my_module import some_object. The former will almost always work, even if my_module included imports us back. The latter only works if my_object is already defined in my_module, which in a circular import may not be the case.
To be specific to your case: Try changing entities/post.py to do import physics and then refer to physics.PostBody rather than just PostBody directly. Similarly, change physics.py to do import entities.post and then use entities.post.Post rather than just Post.
When you import a module (or a member of it) for the first time, the code inside the module is executed sequentially like any other code; e.g., it is not treated any differently that the body of a function. An import is just a command like any other (assignment, a function call, def, class). Assuming your imports occur at the top of the script, then here's what's happening:
When you try to import World from world, the world script gets executed.
The world script imports Field, which causes the entities.field script to get executed.
This process continues until you reach the entities.post script because you tried to import Post
The entities.post script causes physics module to be executed because it tries to import PostBody
Finally, physics tries to import Post from entities.post
I'm not sure whether the entities.post module exists in memory yet, but it really doesn't matter. Either the module is not in memory, or the module doesn't yet have a Post member because it hasn't finished executing to define Post
Either way, an error occurs because Post is not there to be imported
So no, it's not "working further up in the call stack". This is a stack trace of where the error occurred, which means it errored out trying to import Post in that class. You shouldn't use circular imports. At best, it has negligible benefit (typically, no benefit), and it causes problems like this. It burdens any developer maintaining it, forcing them to walk on egg shells to avoid breaking it. Refactor your module organization.
To understand circular dependencies, you need to remember that Python is essentially a scripting language. Execution of statements outside methods occurs at compile time. Import statements are executed just like method calls, and to understand them you should think about them like method calls.
When you do an import, what happens depends on whether the file you are importing already exists in the module table. If it does, Python uses whatever is currently in the symbol table. If not, Python begins reading the module file, compiling/executing/importing whatever it finds there. Symbols referenced at compile time are found or not, depending on whether they have been seen, or are yet to be seen by the compiler.
Imagine you have two source files:
File X.py
def X1:
return "x1"
from Y import Y2
def X2:
return "x2"
File Y.py
def Y1:
return "y1"
from X import X1
def Y2:
return "y2"
Now suppose you compile file X.py. The compiler begins by defining the method X1, and then hits the import statement in X.py. This causes the compiler to pause compilation of X.py and begin compiling Y.py. Shortly thereafter the compiler hits the import statement in Y.py. Since X.py is already in the module table, Python uses the existing incomplete X.py symbol table to satisfy any references requested. Any symbols appearing before the import statement in X.py are now in the symbol table, but any symbols after are not. Since X1 now appears before the import statement, it is successfully imported. Python then resumes compiling Y.py. In doing so it defines Y2 and finishes compiling Y.py. It then resumes compilation of X.py, and finds Y2 in the Y.py symbol table. Compilation eventually completes w/o error.
Something very different happens if you attempt to compile Y.py from the command line. While compiling Y.py, the compiler hits the import statement before it defines Y2. Then it starts compiling X.py. Soon it hits the import statement in X.py that requires Y2. But Y2 is undefined, so the compile fails.
Please note that if you modify X.py to import Y1, the compile will always succeed, no matter which file you compile. However if you modify file Y.py to import symbol X2, neither file will compile.
Any time when module X, or any module imported by X might import the current module, do NOT use:
from X import Y
Any time you think there may be a circular import you should also avoid compile time references to variables in other modules. Consider the innocent looking code:
import X
z = X.Y
Suppose module X imports this module before this module imports X. Further suppose Y is defined in X after the import statement. Then Y will not be defined when this module is imported, and you will get a compile error. If this module imports Y first, you can get away with it. But when one of your co-workers innocently changes the order of definitions in a third module, the code will break.
In some cases you can resolve circular dependencies by moving an import statement down below symbol definitions needed by other modules. In the examples above, definitions before the import statement never fail. Definitions after the import statement sometimes fail, depending on the order of compilation. You can even put import statements at the end of a file, so long as none of the imported symbols are needed at compile time.
Note that moving import statements down in a module obscures what you are doing. Compensate for this with a comment at the top of your module something like the following:
#import X (actual import moved down to avoid circular dependency)
In general this is a bad practice, but sometimes it is difficult to avoid.
For those of you who, like me, come to this issue from Django, you should know that the docs provide a solution:
https://docs.djangoproject.com/en/1.10/ref/models/fields/#foreignkey
"...To refer to models defined in another application, you can explicitly specify a model with the full application label. For example, if the Manufacturer model above is defined in another application called production, you’d need to use:
class Car(models.Model):
manufacturer = models.ForeignKey(
'production.Manufacturer',
on_delete=models.CASCADE,
)
This sort of reference can be useful when resolving circular import dependencies between two applications...."
I was able to import the module within the function (only) that would require the objects from this module:
def my_func():
import Foo
foo_instance = Foo()
If you run into this issue in a fairly complex app it can be cumbersome to refactor all your imports. PyCharm offers a quickfix for this that will automatically change all usage of the imported symbols as well.
I was using the following:
from module import Foo
foo_instance = Foo()
but to get rid of circular reference I did the following and it worked:
import module.foo
foo_instance = foo.Foo()
According to this answer we can import another module's object in the block( like function/ method and etc), without circular import error occurring, for example for import Simple object of another.py module, you can use this:
def get_simple_obj():
from another import Simple
return Simple
class Example(get_simple_obj()):
pass
class NotCircularImportError:
pass
In this situation, another.py module can easily import NotCircularImportError, without any problem.
just check your file name see if it is not the same as library you are importing.
Eg - sympy.py
import sympy as sym

Python Module Update

I have a main file that imports a class from another file as such:
from pybrain.rl.environments.HoldemTask import HoldemTask.
When I change HoldemTask.py, the changes are not reflected in the main file. The only workaround I have found is to run Pybrain's
python setup.py install
Can I reload the module or something? Reload() doesn't seem to work.
First off: python setup.py install generally makes a copy of the code it is installing, so if you're finding that you need to run that before changes take effect, chances are that for development you should be adjusting your PYTHONPATH or sys.path so that your relevant imports come directly from the source tree rather than from the Python site-packages library. You can quickly check which file your code is importing by putting this on the top of the main file when you run it:
from pybrain.rl.environments import HoldemTask # module object, not class
print(HoldemTask.__file__)
Secondly, in general it is far better to restart a Python process when making code changes to ensure that they come into effect. If you really need to get changes to show up without a restart, read on.
Reloading a module in Python only affects future imports. For a reload to work in-process, you have to replace the imported class object after the reload. For example, in the context of the "main file" performing the import you listed (inside a class method or function is fine):
# we need a module object to reload(), not the class inside it
from import pybrain.rl.environments import HoldemTask as HoldemTask_module
reload(HoldemTask_module)
# we then need to replace the old class object with the reloaded one
# in the main file's module-wide (aka "global") namespace
global HoldemTask
HoldemTask = HoldemTask_module.HoldemTask
One final caveat here is that any existing HoldemTask objects will continue to use the old code, because they embed in themselves a reference to the pre-reload class object. The only way for an in-process reload to be complete is if the code is specifically written to throw away every instance of anything it made based on the original module.

Load module from python string without executing code

I'm trying to do something similar with: Load module from string in python
Sometimes I read source code provided by the user:
module = imp.new_module('AppModule')
exec_(s, module.__dict__)
sys.modules['AppModule'] = module
return module
And sometimes import from a file
module = __import__(module_name, fromlist=_fromlist)
return module
These modules are loaded to map objects and it'll be used inside a IDE.
My problem is: If there is any method call outside if __name__ == '__main__': this code is being executed and can interact with the IDE.
How can I import modules ignoring methods from being executed?
The process of importing a module requires that its code be executed. The interpreter creates a new namespace and populates it by executing the module's code with the new namespace as the global namespace, after which you can access those values (remember that the def and class statements are executable).
So maybe you will have to educate your users not to write modules that interact with the IDE?

Recursive module import and reload

Can someone explain why executing the following code:
file "hello.py":
import hello
print "hello"
hello = reload(hello)
executing as python hello.py prints the following?
hello
hello
hello
hello
Why 4 times? I know that when a module is already imported it's not imported again, but reload forces to reload a module even if it's already loaded. I'd have expected as a result unlimit 'hello' prints.
What has to happen so reload won't reload a module?
python hello.py (A) runs the code once, when (A) calls import hello the code is run again (B), when (A) and (B) call reload(hello), the code is run twice more, for four times total.
In general, for the lifetime of a program a module's code will be executed at the following times:
Once if it is the main module
When it is imported the first time by any module (including itself)
Any time reload() is called on the module
As for why the reload() is not called recursively, there is an early exit point to the PyImport_ReloadModule() function (CPython, file is import.c) to prevent this:
http://svn.python.org/view/python/trunk/Python/import.c?view=markup#l2646
...
existing_m = PyDict_GetItemString(modules_reloading, name);
if (existing_m != NULL) {
/* Due to a recursive reload, this module is already
being reloaded. */
Py_INCREF(existing_m);
return existing_m;
}
... load module code is below here
reload keeps a list (actually a dict) of modules it is currently reloading to avoid reloading modules recursively.
See http://hg.python.org/cpython/file/e6b8202443b6/Lib/imp.py#l236
This isn't documented, as such, but I think you can probably rely on it remaining the case.

Categories