Modifying/debugging frozen builtins like importlib/_boostrap_external.py

Modifying/debugging frozen builtins like importlib/_boostrap_external.py - python

Short version: How do I debug frozen libs? Can I tell Python not to use them (and use sourcefiles) or re-freeze them some how?
I'm trying to learn more about the inner functionality of pythons import mechanisms. And by going down that rabbit hole, I want to debug /usr/lib/python37/importlib/_bootstrap_external.py.
Using python -m trace --trace myscript.py gave me indications of where in _bootstrap_external.py I am landing, but not values. So I turned to pdb and it gave me a bit of information but appears to be skipping _frozen objects. (?).
So I added a few print('moo') where I saw my script ending up in _boostrap_external.py but no prints are ever made, because Python internally uses <class '_frozen_importlib_external.PathFinder'>
That lead me in a desperate attempt to try and renaming/removing /usr/lib/python37/importlib/__pycache__ in a hope that Python would re-compile my changes. But no such luck, Python still uses a frozen version.
So I modified /usr/lib/python37/__init__.py where it imports the _bootstrap module, so I changed this:
try:
import _frozen_importlib_external as _bootstrap_external
except ImportError:
from . import _bootstrap_external
_bootstrap_external._setup(_bootstrap)
_bootstrap._bootstrap_external = _bootstrap_external
else:
_bootstrap_external.__name__ = 'importlib._bootstrap_external'
_bootstrap_external.__package__ = 'importlib'
try:
_bootstrap_external.__file__ = __file__.replace('__init__.py', '_bootstrap_external.py')
except NameError:
# __file__ is not guaranteed to be defined, e.g. if this code gets
# frozen by a tool like cx_Freeze.
pass
sys.modules['importlib._bootstrap_external'] = _bootstrap_external
to this:
from . import _bootstrap_external
_bootstrap_external._setup(_bootstrap)
_bootstrap._bootstrap_external = _bootstrap_external
_bootstrap_external.__name__ = 'importlib._bootstrap_external'
_bootstrap_external.__package__ = 'importlib'
try:
_bootstrap_external.__file__ = __file__.replace('__init__.py', '_bootstrap_external.py')
except NameError:
# __file__ is not guaranteed to be defined, e.g. if this code gets
# frozen by a tool like cx_Freeze.
pass
sys.modules['importlib._bootstrap_external'] = _bootstrap_external
And that worked, ish. Obviously the problem will become more and more complex, and there has to be a more generic solution. But I continued my journey and found that _bootstrap.py is indeed loaded with my print('moo') changes. But when /usr/lib/python37/importlib/_bootstrap.py later calls the function _find_spec, and it does:
for finder in meta_path:
with _ImportLockContext():
find_spec = finder.find_spec
spec = find_spec(name, path, target)
find_spec will refer to the frozen version again. (Code snippet shortened to the relevant code in _bootstrap.py)
So my question finally ends up in, how do I debug these frozen files? or how do I make it so Python ignores frozen libraries (I can take the performance impact when debugging). Every attempt to find information on this (searching, docs, irc, etc) ends up on "why?" or "don't do this" or it points me to py2exe and how to freeze my libraries. I just want to be able to debug and understand more of how the inner mechanics work by trying things and looking at the variables.

Related

Load and execute a full python script from a raw link?

I'm facing some problems trying to load a full python script from my pastebin/github pages.
I followed this link, trying to convert the raw into a temp file and use it like a module: How to load a python script from a raw link (such as Pastebin)?
And this is my test (Using a really simple python script as raw, my main program is not so simple unfortunately): https://trinket.io/python/0e95ba50c8
When I run the script (that now is creating a temp file in the current directory of the .py file) I get this error:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\BOT\\Images\\tempxm4xpwpz.py'
Otherwise I also treid the exec() function... No better results unfortunately.
With this code:
import requests as rq
import urllib.request
def main():
code = "https://pastebin.com/raw/MJmYEKqh"
response = urllib.request.urlopen(code)
data = response.read()
exec(data)
I get this error:
File "<string>", line 10, in <module>
File "<string>", line 5, in hola
NameError: name 'printest' is not defined
Since my program is more complex compared to this simple test, I don't know how to proceed...
Basically What I want to achieve is to write the full script of my program on GitHub and connect it to a .exe so if I upgrade the raw also my program is updated. Avoiding to generate and share (only with my friends) a new .exe everytime...
Do you think is possible? If so.. what am I doing wrong?
PS: I'm also open to other possibilities to let my friends update the program without downloading everytime the .exe, as soon as they don't have to install anything (that's why I'm using .exe).

Disclaimer: it is really not a good idea to run an unverified (let alone untrusted) code. That being said if you really want to do it...
Probably the easiest and "least-dirty" way would be to run whole new process. This can be done directly in python. Something like this should work (inspiration from the answer you linked in your question):
import urllib.request
import tempfile
import subprocess
code = "https://pastebin.com/raw/MJmYEKqh"
response = urllib.request.urlopen(code)
data = response.read()
with tempfile.NamedTemporaryFile(suffix='.py') as source_code_file:
source_code_file.write(data)
source_code_file.flush()
subprocess.run(['python3', source_code_file.name])
You can also make your code with exec run correctly:
What may work:
exec(data, {}) -- All you need to do, is to supply {} as second argument (that is use exec(data, {})). Function exec may receive two additional optional arguments -- globals and locals. If you supply just one, it will use the same directory for locals. That is the code within the exec would behave like sort-of "clean" environment, at the top-level. Which is something you aim for.
exec(data, globals()) -- Second option is to supply the globals from your current scope. This will also work, though you probably has no need to give the execucted code access to your globals, given that that code will set-up everything inside anyway
What does not work:
exec(data, {}, {}) -- In this case the executed code will have two different dictionaries (albeit both empty) for locals and globals. As such it will behavie "as-in" (I'm not really sure about this part, but as I tested it, it seams as such) the function. Meaning that it will add the printest and hola functions to the local scope instead of global scope. Regardless, I expected it to work -- I expected it will just query the printest in the hola function from the local scope instead of global. However, for some reason the hola function in this case gets compiled in such a way it expects printest to be in global scope and not local, which is not there. I really did not figured out why. So this will result in the NameError
exec(data, globals(), locals()) -- This will provide access to the state from the caller function. Nevertheless, it will crash for the very same reason as in the previous case
exec(data) -- This is just a shorthand for exec(data, globals(), locals()

Reverse engineering Python Executable with unknown compiler

I have a python script compiled to an EXE without the knowledge of the compiler that was used.
I've been searching for an answer for a while but I can't seem to find anything about this. Most, if not all only show how to decompile Pyinstaller or py2exe using scripts that are on Github and some are just outdated. However, I have attempted something different. Like code execution through the application but from what I have tried, none have worked.
I've attempted PyMem, and Pynject along with some others.
pymem:
import pymem
import subprocess
try:
mem = pymem.Pymem("test.exe")
except:
subprocess.Popen("test.exe")
mem = pymem.Pymem("test.exe")
mem.inject_python_interpreter()
code = """
import os
os.system("cls")
print("some text")
# or anything really, I've tried to write functions or even print/write __file__ then nothing happens
"""
mem.inject_python_shellcode(code)
Any help would be appreciated.

Use different config files - decision should be made via command-line argument

I have a script, that uses a config file called config.py. Actually this is rather a configuration module then. Anyways: the configuration-module contains a lot of parameters and dictionaries and lists of dictionaries and so on.
In the script today it is used like this
import config
def main():
myParameter = config.myParameter
Now I have another application scenario for this script that uses a related config ('config_advanced.py', but the parameters and dictionaries have different values.
My goal is now, to chose the name of the used config-modul as a passed command-line argument:
myScript.py -configuration config_advanced.py
Since the configuration-module is in the same folder than the main script, I guess I have to rename the passed configuration file to 'config.py' first. Afterwards I can perform import config. Otherwise, if I used `import config_advanced, I wouldn't be able to use a call like
config.myParameter
in the main script.
Another possibility could be, to put the configuration-modules in subfolders and keep the name config.py. The passed command-line-argument will have to contain the subfolder then.
Either way I won't be able to perform the import at the top of the main file, since I have to do the argument parsing first. This isn't a technically problem, but someone said that this it at least bad pratice.
What do you think?
What is a better way to do the trick with not much effort?
Thanks a lot
Edit:
One working solution has been
import sys fullpath = "d:\\python\\scripts\\projectA\\configurationFiles\\"
sys.path.append(fullpath)
config = __import__('config_advanced')
Without syspath it does NOT work, so those following tries won't work:
config = __import__('d:\\python\\scripts\\projectA\\configurationFiles\\config_advanced')
config = __import__('d:\\python\\scripts\\projectA\\configurationFiles\\config_advanced.py')

Another possibility that's similar to what you suggest in the question, but which doesn't need you to hide things in subfolders, is to put config_advanced.py and config_plain.py in the same folder as the main script and then dynamically make config.py a link to the actual config file you want to use.
However, martineau's suggestion is much simpler.
OTOH, georg brings up a very valid point, especially if this script isn't just for your own personal use. While using Python itself for the config data is flexible and powerful, it's perhaps a little too powerful. Config data should just be data, not live executable code. If you make a minor mistake when modifying config data you could cause havoc if it's in an executable file. And if a malicious user gets to it, there's no limit to the damage they could cause.
Bad data in a plain old data file will at worst cause a ValueError if it does something weird that your config parsing code isn't suspecting. But bad data in a live Python file could throw all sorts of nasty errors. Or even worse, it could do something evil in complete silence...
In reply to your comments, here's some code to illustrate the first point:
#! /usr/bin/env python
import os
config_file = "config.py"
def link_config(mode):
if os.path.exists(config_file):
os.remove(config_file)
config_name = "config_%s.py" % mode
os.symlink(config_name, config_file)
#.... parse command line to determine config_mode string, then do
link_config(config_mode)
#Now import the newly-linked config file
import config
If config_mode == "plain" the above code will cause config_plain.py to be imported as 'config'
and if config_mode == "advanced" it will cause config_advanced.py to be imported as 'config'
But as I said before, martineau's method is much simpler. And IIRC, os.symlink may not work on non-unix systems.
...
As for your second point, check out the docs for the json module

Using object in different module than original script

I just started with Python and I'm having some problems. I've written already a few scripts for ArcGIS and had some recurring stuff. So I thought it would be smart to put that in modules which I can easily use again.
So now I have two scripts, script.py and toolbox.py.
My script was working fine so I copied and paste the part I needed, edited it a bit and everything goes well except for the messages created with gp.Addmessage
script.py will create the message "Hello Stackoverflow" but the messages from toolbox.py doesn't show up. Why is that? It loads the toolbox because I can use it later on, so it regocnizes the gp object.
I'm kind of stuck here, would love to be able to print messages from inside the modules to inform the user of the tool what is happening.
script.py:
import os, sys, arcgisscripting
# Create the Geoprocessor object
gp = arcgisscripting.create()
gp.AddMessage("# Hello Stackoverflow")
import toolbox
toolbox.loadToolbox
toolbox.py:
def loadToolbox:
try:
some code
gp.AddToolbox(path)
gp.AddMessage("# Toolbox loaded")
except:
gp.AddMessage("# Toolbox not found")

You have two problems with your code:
You never call the loadToolBox method, you only refer to it. Add ():
toolbox.loadToolbox()
Your loadToolbox() function doesn't take gp as an argument. If gp is meant to be a global, then it won't be visible to the toolbox module (globals are only visible in the current module).
Add gp as a parameter and pass it in when calling loadToolbox. In script.py:
toolbox.loadToolbox(gp)
and in toolbox.py:
def loadToolbox(gp):
# rest of function

_shutdown AttributeError (ignored) when linting code that uses M2Crypto

I'm running lint as follows:
$ python -m pylint.lint m2test.py
with this code:
import M2Crypto
def f():
M2Crypto.RSA.new_pub_key("").as_pem(cipher=None).split("\n")
The lint output ends with:
Exception AttributeError: '_shutdown' in <module 'threading' from '/usr/lib/python2.7/site-packages/M2Crypto-0.21.1-py2.7-linux-x86_64.egg/M2Crypto/threading.pyc'> ignored
This code works fine when run (the above is actually a minimal test case; but the full version does work). The exception is ignored, but Bitten considers this a failure, so stops on this step.
I've tried adding 'M2Crypto.threading.init()'/'M2Crypto.threading.cleanup()' around the definition of the function, but that didn't fix the problem.
How can I prevent this problem from occurring?
I'm using M2Crypto 0.21.1, pylint 0.24 and Python 2.7 (also tried 2.7.2) on Debian Lenny x86_64.

The exception that you are seeing is caused by a bug in the astng package (presumably “Abstract Syntax Tree, Next Generation”?) which is a toolkit on which pylint depends, written by the same people. I should note in passing that I always encourage people to use pyflakes instead of pylint when possible, because it is quick, simple, fast, and predictable, whereas pylint tries to do several kinds of deep magic that are not only slow but that can get it into exactly this kind of trouble. :)
Here are the two packages on PyPI:
http://pypi.python.org/pypi/pylint
http://pypi.python.org/pypi/astng
And note that this problem had to be, necessarily, a bug in pylint and not in your code, because pylint does not run your code in order to produce its report — imagine the havoc that could be wreaked if it did (since code being linted might delete files, etcetera)! Since your code does not get run, no amount of caution, like protecting your call with threading init() or cleanup() functions, could possibly have prevented this error — unless the code snippets happened, for other reasons, to alter the behavior we are about to investigate.
So, on to your actual exception.
I had never actually heard of _shutdown before! A quick search of the Python standard library showed its definition in threading.py but not a call of the function from anywhere; only by searching the Python C source code did I discover where in pythonrun.c, during interpreter shutdown, the function is actually called:
static void
wait_for_thread_shutdown(void)
{
...
PyObject *threading = PyMapping_GetItemString(tstate->interp->modules,
"threading");
if (threading == NULL) {
/* threading not imported */
PyErr_Clear();
return;
}
result = PyObject_CallMethod(threading, "_shutdown", "");
if (result == NULL) {
PyErr_WriteUnraisable(threading);
}
...
}
Apparently it is some sort of cleanup function that the threading Standard Library module requires, and they have special-cased the Python interpreter itself to make sure that it gets called.
As you can see from the code above, Python quietly and without complaint handles the case where the threading module never gets imported during a program's run. But if threading does get imported, and still exists at shutdown time, then the interpreter looks inside for a _shutdown function and goes so far as to print an error message — and then return a non-zero exit status, the cause of your problems — if it cannot call it.
So we have to discover why the threading module exists but has no _shutdown method at the moment when pylint is done examining your program and Python is exiting. Some instrumention is called for. Can we print out what the module looks like as pylint exits? We can! The pylint/lint.py module, in its last few lines, runs its “main program” by instantiating a Run class it has defined:
if __name__ == '__main__':
Run(sys.argv[1:])
So I opened lint.py in my editor — one of the magnificent things about having each little project installed in a Python Virual Environment is that I can jump in and edit third-party code for quick experiments — and added the following print statement down at the bottom of the Run class's __init__() method:
sys.path.pop(0)
print "*****", sys.modules['threading'].__file__ # added by me!
if exit:
sys.exit(self.linter.msg_status)
I re-ran the command:
python -m pylint.lint m2test.py
And out came the __file__ string of the threading module:
***** /home/brandon/venv/lib/python2.7/site-packages/M2Crypto/threading.pyc
Well, look at that.
This is the problem!
According to this path, there actually exists an M2Crypto/threading.py module that, under all normal circumstances, should just be called M2Crypto.threading, and therefore sit in the sys.modules dictionary under the name:
sys.modules['M2Crypto.threading']
But somehow that file is also getting loaded as the main Python threading module, shadowing the official threading module that sits in the Standard Library. Because of this, the Python exit logic is quite correctly complaining that the Standard Library _shutdown() function is missing.
How could this happen? Top-level modules can only appear in paths that are listed explicitly in sys.path, not in sub-directories beneath them. This leads to a new question: is there any point during the pylint run that the …/M2Crypto/ directory itself is getting put on sys.path as though it contained top-level modules? Let's see!
We need more instrumentation: we need to have Python tell us the moment that a directory with M2Crypto in the name appears in sys.path. It will really slow things down, but let's add a trace function to pylint's __init__.py — because that is the first module that gets imported when you run -m pylint.lint — that will write an output file telling us, for every line of code executed, whether sys.path has any bad values in it:
def install_tracer():
import sys
output = open('mytracer.out', 'w')
def mytracer(frame, event, arg):
broken = any(p.endswith('M2Crypto') for p in sys.path)
output.write('{} {}:{} {}\n'.format(
broken, frame.f_code.co_filename, frame.f_lineno, event))
return mytracer
sys.settrace(mytracer)
install_tracer()
del install_tracer
Note how careful I am here: I define only one name in the module's namespace, and then carefully delete it to clean up after myself before I let pylint continue loading! And all of the resources that the trace function itself needs — namely, the sys module and the output open file — are available in the install_tracer() closure so that, from the outside, pylint looks exactly the same as always. Just in case anyone tries to introspect it, like pylint might!
This generates a file mytracer.out of about 800k lines, that each look something like this:
False /home/brandon/venv/lib/python2.7/posixpath.py:118 call
The False says that sys.path looks clean, the filename and line number are the line of code being executed, and call indicates what stage of execution the interpreter is in.
So does sys.path ever get poisoned? Let's look at just the first True or False on each line, and see how many successive lines start with each value:
$ awk '{print$1}' mytracer.out | uniq -c
607997 False
3173 True
4558 False
33217 True
4304 False
41699 True
2953 False
110503 True
52575 False
Wow! That's a problem! For runs of several thousand lines at a time, our test case is True, which means that the interpreter is running with …/M2Crypto/ — or some variant of a pathname with M2Crypto in it — on the path, where it should not be; only the directory that contains …/M2Crypto should ever be on the path. Looking for the first False to True transition in the file, I see this:
False /home/brandon/venv/lib/python2.7/site-packages/logilab/astng/builder.py:132 line
False /home/brandon/venv/lib/python2.7/posixpath.py:118 call
...
False /home/brandon/venv/lib/python2.7/posixpath.py:124 line
False /home/brandon/venv/lib/python2.7/posixpath.py:124 return
True /home/brandon/venv/lib/python2.7/site-packages/logilab/astng/builder.py:133 line
And looking at lines 132 and 133 in the builder.py file reveals our culprit:
130 # build astng representation
131 try:
132 sys.path.insert(0, dirname(path)) # XXX (syt) iirk
133 node = self.string_build(data, modname, path)
134 finally:
135 sys.path.pop(0)
Note the comment, which is part of the original code, not an addition of my own! Obviously, XXX (syt) iirk is an exclamation in this programmer's strange native language for the phrase, “put this module's parent directory on sys.path so that pylint will break mysteriously every time someone forces pylint to introspect a package with a threading sub-module.” It is, obviously, a very compact native language. :)
If you adjust the tracing module to watch sys.modules for the actual import of threading — an exercise I will leave to the reader — you will see that it happens when SocketServer, which is imported by some other Standard Library module during the analysis, in turn tries to innocently import threading.
So let us review what is happening:
pylint is dangerous magic.
As part of its magic, if it sees you import foo, then it runs off trying to find foo.py on disk, to parse it, and to predict whether you are loading valid or invalid names from its namespace.
[See my comment, below.] Because you call .split() on the return value of RSA.as_pem(), pylint tries to introspect the as_pem() method, which in turn uses the M2Crypto.BIO module, which in turn makes calls that induce pylint to import threading.
As part of loading any module foo.py, pylint throws the directory containing foo.py on sys.path, even if that directory is inside a package, and therefore gives modules in that directory the privilege of shadowing Standard Library modules of the same name during its analysis.
When Python exits, it is upset that the M2Crypto.threading library is sitting where threading belongs, because it wants to run the _shutdown() method of threading.
You should report this as a bug to the pylint / astng folks at logilab.org. Tell them I sent you.
If you decide to keep using pylint after it has done this to you, then there seem to be two solutions in this case: either don't inspect code that calls M2Crypto, or import threading during the pylint import process — by sticking import threading into the pylint/__init__.py, for example — so that the module gets the chance to grab the sys.modules['threading'] slot before pylint gets all excited and tries to let M2Crypto/threading.py grab the slot instead.
In conclusion, I think the author of astng says it best: XXX (syt) iirk. Indeed.

Many thanks to Brandon Craig Rhodes for having tracing this down and for such a detailed post.
I've removed the offending line from astng, code available from the hg repository until logilab-astng 0.23.0 is out. And I can confirm this fixes the OP's pb.

This looks more like a hack but I think it works. Copying the result of "as_pem()" and splitting it.
import M2Crypto
def f():
M2Crypto.RSA.new_pub_key("").as_pem(cipher=None)[:].split("\n")
I'm using Python 2.6.7, M2Crypto 0.21.1, pylint 0.23

I was unable to reproduce (pylint 0.24 and M2Crypto 0.21.1 on Ubuntu 11.04 64bit) but two suggestions:
Explicitly initialize threading:
import M2Crypto
def f():
M2Crypto.threading.init()
M2Crypto.RSA.new_pub_key("").as_pem(cipher=None).split("\n")
M2Crypto.threading.cleanup()
Or recompile without threading:
m2crypto = Extension(name = 'M2Crypto.__m2crypto',
sources = ['SWIG/_m2crypto.i'],
extra_compile_args = ['-DTHREADING'],
#extra_link_args = ['-Wl,-search_paths_first'], # Uncomment to build Universal Mac binaries
)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Modifying/debugging frozen builtins like importlib/_boostrap_external.py - python

Related

Load and execute a full python script from a raw link?

Reverse engineering Python Executable with unknown compiler

Use different config files - decision should be made via command-line argument

Using object in different module than original script

_shutdown AttributeError (ignored) when linting code that uses M2Crypto

Categories

Resources