Understanding Python Pickle Insecurity

Understanding Python Pickle Insecurity - python

It states in the Python documentation that pickle is not secure and shouldn't parse untrusted user input. If you research this; almost all examples demonstrate this with a system() call via os.system.
Whats not clear to me, is how os.system is interpreted correctly without the os module being imported.
>>> import pickle
>>> pickle.loads("cos\nsystem\n(S'ls /'\ntR.") # This clearly works.
bin boot cgroup dev etc home lib lib64 lost+found media mnt opt proc root run sbin selinux srv sys tmp usr var
0
>>> dir() # no os module
['__builtins__', '__doc__', '__name__', '__package__', 'pickle']
>>> os.system('ls /')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'os' is not defined
>>>
Can someone explain?

The name of the module (os) is part of the opcode, and pickle automatically imports the module:
# pickle.py
def find_class(self, module, name):
# Subclasses may override this
__import__(module)
mod = sys.modules[module]
klass = getattr(mod, name)
return klass
Note the __import__(module) line.
The function is called when the GLOBAL 'os system' pickle bytecode instruction is executed.
This mechanism is necessary in order to be able to unpickle instances of classes whose modules haven't been explicitly imported into the caller's namespace.

For altogether too much information on writing malicious Pickles that go much further than the standard os.system() example, see this presentation and its accompanying paper.

If you use pickletools.dis to disassemble the pickle you can see how this is working:
import pickletools
print pickletools.dis("cos\nsystem\n(S'ls ~'\ntR.")
Output:
0: c GLOBAL 'os system'
11: ( MARK
12: S STRING 'ls ~'
20: t TUPLE (MARK at 11)
21: R REDUCE
22: . STOP
Pickle uses a simple stack-based virtual machine that records the instructions used to reconstruct the object. In other words the pickled instructions in your example are:
Push self.find_class(module_name, class_name) i.e. push os.system
Push the string 'ls ~'
Build tuple from topmost stack items
Apply callable to argtuple, both on stack. i.e. os.system(*('ls ~',))
Source

Importing a module only adds it to the local namespace, which is not necessarily the one you're in. Except when it doesn't:
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']
>>> __import__('os')
<module 'os' from '/usr/lib64/python2.7/os.pyc'>
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']

Related

Integrating Flask/Dill to dump/load server sessions

I'm trying to integrate Flask with Dill to dump/load Python sessions on the server side. The code below has two functions, the first one sets the value of x to zero and imports the datetime library. The second one increments x by 1 and gets the timestamp.
The first function dumps the session and the second function loads it.
In the dump, the pickle file is generated correctly, but I cannot reuse x or get the timestamp.
This is the error when I try to execute x = x + 1 in the second function:
UnboundLocalError: local variable 'x' referenced before assignment
Can Dill be used with Flask? Do I need a different approach?
The code:
from flask import Flask
from dill import dump_session, load_session
app = Flask(__name__)
app.config['SECRET_KEY'] = 'super secret'
session_file = '/tmp/session.pkl'
#app.route('/start_counter')
def start_counter():
import datetime
x = 0
dump_session(filename = session_file)
return 'New counter started!'
#app.route('/count')
def count():
load_session(filename = session_file)
x = x + 1
now = datetime.datetime.now()
dump_session(filename = session_file)
return str(x) + '-' + str(now)

How to fix?
To make things simple, you need a data structure to hold your application state. I would use a dict because it's simple, but you can define a class for that too.
The easy (and tedious) way is to call state = dill.load('filename')and dill.dump(object,'filename') every time you need your application state.
This will work if your application is small. If you need to maintain a proper application state, you should use a database.
Ok. But WHAT happened here?
There are no compatibility issues with dill and Flask.
When you call dill.dump_session() it saves the state of __main__.
But, when you increase x in function count(), it is undefined because it wasn't saved by dill.
An easy way to see that is to put a breakpoint() before x = x + 1 or print the content inside a try..except clause:
try:
print(x)
except ee:
print(ee)
pass;
x = x + 1
So, it didn't worked because the variable x wasn't defined in __main__ but in in the scope of the start_counter() function and dill.load_session() restores the stuff in __main__.
And what does that __main__ stuff means?
Let's see that using a Repl:
~/$ python
Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
Perfect. We have an empty python interpreter. That dir() shows what we have in __main__.
Now we'll load some libraries, and assign a variable, and define a function just because we can:
>>> import pandas, numpy, dill, pickle, json, datetime
>>> foo = "bar"
>>> def functionWithUglyName():
... print("yep")
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'datetime', 'dill', 'foo', 'functionWithUglyName', 'json', 'numpy', 'pandas', 'pickle']```
Well. That __main__ stuff looks more populated.
Now let's save the session and exit the Repl:
>>> dill.dump_session('session_01')
>>> exit()
What happens when we load the session with `dill.load_session()' ?
Let's open another Repl to discover it:
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
Ok. Just another empty python interpreter...
Let's load the session and see what happens:
>>> import dill
>>> dill.load_session('session_01')
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'datetime', 'dill', 'foo', 'functionWithUglyName', 'json', 'numpy', 'pandas', 'pickle']
It loaded the contents of __main__ as expected.
Wait a second.
It loaded the functionWithUglyName we defined before.
Is it real?
>>> functionWithUglyName()
yep
Turns out that dill is really good at serializing stuff.
Most of the time you'll just need to pickle some data, but dill can do much more... and it is great for debugging and testing.

python not recognizing function within imported module

I am working in Jupyter notebook. I created a simple module called conv.py for converting miles to km. When I try to import this module in a separate code (in the same directory) the import seems to go successfully but it doesn't recognize either of the functions I defined in the 'conv' module.
I have imported os and os.getcwd() provides the correct folder for conv.py...
code for conv.py
in_n_ft = 12
ft_n_mile = 5280
m_n_km = 1000
cm_n_in = 2.54
cm_n_m = 100
mm_n_m = 1000
def ft_to_km(feet):
return feet*in_n_ft*cm_n_in/cm_n_m/m_n_km
print(ft_to_km(5280))
def mil_to_km(mile):
return mile*ft_n_mile*in_n_ft*cm_n_in/cm_n_m/m_n_km
print(mil_to_km(3.2))
Code for new module
import conv
km = conv.mil_to_km(5)
Error provided
AttributeError Traceback (most recent call last)
<ipython-input-111-bfd778724ae2> in <module>
3 import conv
4
----> 5 km = conv.mil_to_km(5)
AttributeError: module 'conv' has no attribute 'mil_to_km'
When I type
dir(conv)
I get
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__']
What am I doing wrong?
EDIT
I have also tried
from conv import mil_to_km
when I do that I get a different error
cannot import name 'mil_to_km' from 'conv' (C:\Users\223023441\Documents\python\conv.py)
I have also queried the module using:
from inspect import getmembers, isfunction
import conv
print(getmembers(conv, isfunction))
from here I get:
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__']
I am also unable to access any of the variables within the conv.py file after import... Am I doing something wrong when I save the py file? Jupyter makes ipynb as the common file, when I 'save as' to conv.py, is this screwing it up?

You should import from the module.
Try this:
from conv import mil_to_km
km = mil_to_km(5)
The reason is that when you import the module in that way, you are executing it.
In the way I shown, you are just importing the needed functions.

So the ultimate issue was the way I was saving the .py file... I was using the 'save as' command in jupyter notebook and typing 'conv.py' for my file save... This was showing up in the directory as a .py file, but my main file wasn't recognizing it properly. Once I downloaded the file as a .py file, cut from my downloads folder and pasted into my working directory everything worked...

Variables from function to current file

1. Summary
I can't find, how I can add variables from YAML files to Python files without duplicates.
2. Purpose
I use Pelican — static sites generator. It use .py files for configuration. Problems:
I can't reuse variables from .py for JavaScript
import * antipattern, that use even in official Pelican blog
I try move configuration to YAML files → I get problem of this question.
3. MCVE
3.1. Files
Live demo on Repl.it
main.py:
"""First Python file."""
# [INFO] Using ruamel.yaml — superset of PyYAML:
# https://stackoverflow.com/a/38922434/5951529
import ruamel.yaml as yaml
SETTINGS_FILES = ["kira.yaml", "kristina.yaml"]
for setting_file in SETTINGS_FILES:
VARIABLES = yaml.safe_load(open(setting_file))
# [INFO] Convert Python dictionary to variables:
# https://stackoverflow.com/a/36059129/5951529
locals().update(VARIABLES)
# [INFO] View all variables:
# https://stackoverflow.com/a/633134/5951529
print(dir())
publishconf.py:
"""Second Python file."""
import ruamel.yaml as yaml
# [NOTE] Another value in list
SETTINGS_FILES = ["kira.yaml", "katya.yaml"]
for setting_file in SETTINGS_FILES:
VARIABLES = yaml.load(open(setting_file))
locals().update(VARIABLES)
print(dir())
kira.yaml:
DECISION: Saint Petersburg
kristina.yaml:
SPAIN: Marbella
katya.yaml:
BURIED: Novoshakhtinsk
3.2. Expected behavior
DECISION and SPAIN variables in main.py:
$ python main.py
['DECISION', 'SETTINGS_FILES', 'SPAIN', 'VARIABLES', '__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '__warningregistry__', 'setting_file', 'yaml']
DECISION and BURIED variables in publishconf.py:
$ python publishconf.py
['BURIED', 'DECISION', 'SETTINGS_FILES', 'VARIABLES', '__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '__warningregistry__', 'setting_file', 'yaml']
3.3. Problem
Duplicate loop in main.py and publishconf.py:
for setting_file in SETTINGS_FILES:
VARIABLES = yaml.load(open(setting_file))
locals().update(VARIABLES)
Can I not use duplicates?
4. Not helped
4.1. Configuration file
Live demo on Repl.it
config.py:
"""Config Python file."""
# [INFO] Using ruamel.yaml — superset of PyYAML:
# https://stackoverflow.com/a/38922434/5951529
import ruamel.yaml as yaml
MAIN_CONFIG = ["kira.yaml", "kristina.yaml"]
PUBLISHCONF_CONFIG = ["kira.yaml", "katya.yaml"]
def kirafunction(pelicanplugins):
"""Function for both Python files."""
for setting_file in pelicanplugins:
# [INFO] Convert Python dictionary to variables:
# https://stackoverflow.com/a/36059129/5951529
variables = yaml.safe_load(open(setting_file))
globals().update(variables)
def main_function():
"""For main.py."""
kirafunction(MAIN_CONFIG)
def publishconf_function():
"""For publishconf.py."""
kirafunction(PUBLISHCONF_CONFIG)
main.py:
"""First Python file."""
import sys
from config import main_function
sys.path.append(".")
main_function()
# [INFO] View all variables:
# https://stackoverflow.com/a/633134/5951529
print(dir())
publishconf.py:
"""Second Python file."""
import sys
from config import publishconf_function
sys.path.append(".")
publishconf_function()
print(dir())
Variables from main_function and publishconf_function doesn't share across files:
$ python main.py
['__annotations__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'main_function', 'signal', 'sys']
4.2. Another attempts
Wrap loop to function as in this example:
def kirafunction():
"""Docstring."""
for setting_file in SETTINGS_FILES:
VARIABLES = yaml.safe_load(open(setting_file))
locals().update(VARIABLES)
kirafunction()
Using global keyword
“I think editing locals() like that is generally a bad idea. If you think globals() is a better alternative, think it twice!”
Search in Stack Overflow questions

I would avoid any update to what locals returns, because documentation explicitely states:
Note
The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
The globals function is a dictionnary simply containing the attributes of a module, and the mapping returned by globals is indeed writable.
So if this exists in one Python source:
def kirafunction(map,settings):
# [NOTE] Another value in list
for setting_file in settings:
VARIABLES = yaml.load(open(setting_file))
map.update(VARIABLES)
This can be used from any other Python source after importing the above function:
kirafunction(globals(), settings)
and will import the variables in the globals dictionnary of the calling module. And will be highly non-pythonic...
A slightly more Pythonic way, would be to dedicate one Python module to hold both the code loading the yaml files and the new variables:
loader.py:
import ruamel.yaml as yaml
SETTINGS_FILES = ["kira.yaml", "kristina.yaml"]
for setting_file in SETTINGS_FILES:
VARIABLES = yaml.safe_load(open(setting_file))
# [INFO] Convert Python dictionary to variables:
# https://stackoverflow.com/a/36059129/5951529
globals().update(VARIABLES)
Then from any other Python module you can use:
...
import loader # assuming that it is in sys.path...
...
print(loader.DECISION)
print(dir(loader))
But it is still uncommon and would require comments to explain the rationale for it.
After reading the Pelican config example from you comment, I assume that what you need is a way to import in different scripts a bunch of variables declared in yaml files. In that case I would put the code loading the variables in one module, and update the globals() dictionnary in the other modules:
loader.py:
import ruamel.yaml as yaml
MAIN_CONFIG = ["kira.yaml", "kristina.yaml"]
PUBLISHCONF_CONFIG = ["kira.yaml", "katya.yaml"]
def kirafunction(pelicanplugins):
"""Function for both Python files."""
variables = {}
for setting_file in pelicanplugins:
# [INFO] Convert Python dictionary to variables:
# https://stackoverflow.com/a/36059129/5951529
variables.update(yaml.safe_load(open(setting_file)))
return variables
Then for example in publishconf.py you would use:
from loader import kirafunction, PUBLISHCONF_CONFIG as pelican_config
# other Python code...
# import variables from the yaml files defined in PUBLISHCONF_CONFIG
# because Pelican expects them as plain Python module variables
globals().update(kirafunction(pelican_config))
Again, updating globals() is probably appropriate in this use case, but is generally frowned upon, hence the comment.

Error when using importlib.util to check for library

I'm trying to use the importlib library to verify whether the nmap library is installed on the computer executing the script in Python 3.5.2
I'm trying to use importlib.util.find_spec("nmap") but receive the following error.
>>> import importlib
>>> importlib.util.find_spec("nmap")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'importlib' has no attribute 'util'
Can someone tell me where I'm going wrong?
EDIT
I was able to get the function to work using the following code.
#!/usr/bin/pythonw
import importlib
from importlib import util
#check to see if nmap module is installed
find_nmap = util.find_spec("nmap")
if find_nmap is None:
print("Error")

Try this:
from importlib import util
util.find_spec("nmap")
I intend to investigate, but honestly I don't know why one works and the other doesn't. Also, observe the following interactive session:
>>> import importlib
>>> importlib.util
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'importlib' has no attribute 'util'
>>> from importlib import util
>>> util
<module 'importlib.util' from '/usr/lib/python3.5/importlib/util.py'>
>>> importlib.util
<module 'importlib.util' from '/usr/lib/python3.5/importlib/util.py'>
So...yeah. I am sure this makes perfect sense to someone, but not to me. I will update once I figure it out.
Update:
Comparing this to something like:
>>> import datetime
>>> datetime
<module 'datetime' from '/usr/lib/python3.5/datetime.py'>
>>> datetime.datetime
<class 'datetime.datetime'>
I think the difference is that in this case the first datetime is a module and the second is a class, while in the importlib.util case both are modules. So perhaps module.module is not OK unless the code from both modules has been loaded, while module.class is OK, because the class code is loaded when the module is imported.
Update #2
Nope, it seems like in many cases module.module is fine. For example:
>>> import urllib
>>> urllib
<module 'urllib' from '/usr/lib/python3.5/urllib/__init__.py'>
>>> urllib.error
<module 'urllib.error' from '/usr/lib/python3.5/urllib/error.py'>
So perhaps it is something specific to importlib.
Update #3
As #kfb pointed out in the comments, it does seem to be related to importlib specifically. See the following comment from the __init__.py for importlib:
# Until bootstrapping is complete, DO NOT import any modules that attempt
# to import importlib._bootstrap (directly or indirectly). Since this
# partially initialised package would be present in sys.modules, those
# modules would get an uninitialised copy of the source version, instead
# of a fully initialised version (either the frozen one or the one
# initialised below if the frozen one is not available).
importlib/util.py does import importlib._bootstrap so I would assume that this is realted. If my understanding is correct, when you do import importlib the submodules will be initialized, but are not initialized for the importlib module object that you have imported. At this point, if you do dir(importlib) you will not see util. Interestingly, after you have tried to access importlib.util and gotten an AttributeError, util (along with the other submodules) gets loaded/initialized, and now you can access importlib.util!
>>> import importlib
>>> dir(importlib)
['_RELOADING', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__import__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_bootstrap', '_bootstrap_external', '_imp', '_r_long', '_w_long', 'find_loader', 'import_module', 'invalidate_caches', 'reload', 'sys', 'types', 'warnings']
>>> importlib.util
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'importlib' has no attribute 'util'
>>> importlib.util
<module 'importlib.util' from '/usr/lib/python3.5/importlib/util.py'>
>>> dir(importlib)
['_RELOADING', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__import__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_bootstrap', '_bootstrap_external', '_imp', '_r_long', '_w_long', 'abc', 'find_loader', 'import_module', 'invalidate_caches', 'machinery', 'reload', 'sys', 'types', 'util', 'warnings']

How are the modules xml.etree.ElementTree and xml related?

I am not a professional programmer, and I'm just starting to study python.
I just figured out that just because I could do:
>>> from xml.etree.ElementTree import Element
>>> var = Element("Something")
doesn't mean I can do:
>>> import xml
>>> var = xml.etree.ElementTree.Element("Something")
In fact, doing this:
>>> import xml
>>> dir(xml)
['_MINIMUM_XMLPLUS_VERSION', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__']
etree doesn't even appear as one of the methods provided by xml.
What is the relation between xml and xml.etree.ElementTree?
Why can I not see etree as one of xml's methods?

xml.etree is a child module to xml package. It only shows up in xml's namespace after you import xml.etree. A parent may import its child modules on its initialization (e.g. in 2.7, os imports os.path1) but it's not required to do so.
Conversely, when you import a module from a package directly, the package is automatically imported first.
Strangely, I couldn't find any phrasing in the docs stating this. But a test shows that this is exactly what happens:
$ cat test/__init__.py
print "package init"
import traceback
traceback.print_stack()
$ cat test/module.py
print "module init"
import traceback
traceback.print_stack()
$ python
<...>
>>> import test.module
package init
File "<stdin>", line 1, in <module>
File "test\__init__.py", line 3, in <module>
traceback.print_stack()
module init
File "<stdin>", line 1, in <module>
File "test\module.py", line 3, in <module>
traceback.print_stack()
1this is not documented though so don't rely on this

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding Python Pickle Insecurity - python

For altogether too much information on writing malicious Pickles that go much further than the standard os.system() example, see this presentation and its accompanying paper.

Importing a module only adds it to the local namespace, which is not necessarily the one you're in. Except when it doesn't: >>> dir() ['builtins', 'doc', 'name', 'package'] >>> import('os') <module 'os' from '/usr/lib64/python2.7/os.pyc'> >>> dir() ['builtins', 'doc', 'name', 'package']

Related

Integrating Flask/Dill to dump/load server sessions

python not recognizing function within imported module

Variables from function to current file

Error when using importlib.util to check for library

How are the modules xml.etree.ElementTree and xml related?

Categories

Resources