Ok, so I know this has been asked before in many different threads but I find myself still trying to reduce my doubt.
I have an application that allows the user to pass a dictionary of strings and 'choose' a backend function from a library to process it. The functions are part of a 'workflow' library and loaded by the system admin on the backend. Available functions are stored in the backend in a manifest file.
The mechanics are such that the users send the dictionary as json to the web app and nominate which function from the library should process it. The function is is then loaded and executed via the python exec() or eval() functions.
Before the execution, the requested function is checked against a list of available functions (whitelist) from the manifest file.
My basic question is, can whitelisting make exec() and eval safe? Could it be made 'safer'?
If I understand it, the function is trusted by the admin and that makes it as safe as any python module you install. Just make sure that the exec part is only done on the trusted code. Here is an example where functions with the same name as their file are loaded and executed.
import json
from pathlib import Path
# files named the same as the function, no .py
FUNCTION_DIR = Path("/my/functions/are/here")
def run_func(name, data):
try:
func_namespace = {}
exec(open(FUNCTION_DIR/name).read(), func_namespace)
return func_namespace[name](json.dumps(data))
except Exception as e:
return "Hey, what kind of game are you playing here? " + str(e)
The the function is naturally whitelisted just because its in the known-safe directory.
Related
I have a python application ready to launch.
One thing keeps tickling my mind is that the application depends on formal API of several sites to get data.
If one site got their API changed, i have to change the code and ask users to reinstall the new application. It's a lot of work if several things hang on at a time.
I came across exec, which could execute a string like a code snippet.
So, if it works good, I can save the critical code parts sqlite3 table.
In case of any change, I can ask users to do an OTA update from inside the application which will just update the sqlite3 table and code would work as usual.
But just got a hammer, return not working inside exec(), just getting return outside function exception. I don't know what are all others things that will go not working if I use exec.
Working:
def func_dyn():
if 1==1:
return 1
else:
print('test')
if __name__ == '__main__':
func_dyn();
Not Working:
global code
code="""if 1==1:
return 1
else:
print('test')
"""
def func_dyn():
global code
exec(code)
if __name__ == '__main__':
func_dyn();
How to handle the return in exec if exec was inside a function / the way it should be formatted/handled?
Why i need to put the whole code of a function into exec?
since there were many functions like this, I can't store small,small parts which could make code unreadable. So i was thinking to put whole function into string.
Why i need return?
If exceptions arise, the function should return to the caller and execute next.
Even though exec works in the current context but it doesn't seem to work in the context of the running method, so return was not possible. A workaround is set a flag inside exec and handle return outside. But exec was not a good candidate for my requirement. Instead, I have decided to properly incorporate update functionality in my code via my own standalone updater or via frameworks like pyupdater, esky, etc.
I think your approach is very difficult to debug. What is the purpose of storing code in SQL anyway? why can't you just prepare an external API file and update that if needed? This can be simply imported by adding a cache folder to your Python path programmatically and will keep your code where it belongs: a .py file.
I am implementing a new module for specific needs in my environment. I would like to print certain outputs (such as some variables) by this module, similar as debug module prints with msg parameter, but in a more customized way.
AnsibleModule class has fail_json() method which accepts msg argument to print on a failure, but I cannot find a way to print a message on success with exit_json()
I also don't know how builtin debug module works, found almost nothing except DOCUMENTATION and EXAMPLES in the module script.
Everything you want to be done on Ansible controller is done by action plugins (they are module's companions).
Take a look at some very simple plugin/module here.
You want to execute module, inspect it's result for your custom message, use display.v or display.warning or anything else to display this message and then return module's result back to Ansible core.
For this very reason debug is an action plugin, and it's module only contains documentation, because all the work is done by plugin itself.
Say I create a simple web server using Flask, and allowing people to query certain things that I modulated in different python files, using the __import__ statement, would doing this with user supplied information be considered a security risk?
Example:
from flask import Flask
app = Flask(__name__)
#app.route("/<author>/<book>/<chapter>")
def index(author, book, chapter):
return getattr(__import__(author), book)(chapter)
# OR
return getattr(__import__("books." + author), book)(chapter)
I've seen a case like this recently when reviewing code, however it didn't feel right to me.
It is entirely insecure, and your system is wide open to attack. Your first return line doesn't limit what kind of names can be imported, which means the user can execute any arbitrary callable in any importable Python module.
That includes:
/pickle/loads/<url-encoded pickle data>
A pickle is a stack language that lets you execute arbitrary Python code, and the attacker can take full control of your server.
Even a prefixed __import__ would be insecure if an attacker can also place a file on your file system in the PYTHONPATH; all they need is a books directory earlier in the path. They can then use this route to have the file executed in your Flask process, again letting them take full control.
I would not use __import__ at all here. Just import those modules at the start and use a dictionary mapping author to the already imported module. You can use __import__ still to discover those modules on start-up, but you now remove the option to load arbitrary code from the filesystem.
Allowing untrusted data to direct calling arbitrary objects in modules should also be avoided (including getattr()). Again, an attacker that has limited access to the system could exploit this path to widen the crack considerably. Always limit the input to a whitelist of possible options (like the modules you loaded at the start, and per module, what objects can actually be called within).
More than being a security risk, it is a bad idea e.g. I could easily crash your web app by visiting the url:
/sys/exit/anything
translating to:
...
getattr(__import__('sys'), 'exit')('anything')
Don't give the possibility to import/execute just about anything to your users. Restrict the possibilities by using say a dictionary of permissible imports, as #MartijnPieters as clearly pointed out.
I am new to Python and recently started writing a script which essentially reads a MySQL database and archives some files by uploading them to Amazon Glacier. I am using the Amazon-provided boto module along with a few other modules.
I noticed that I seem to be replicating the same pattern over and over again when installing and taking advantage of these modules that connect to external services. First, I write a wrapper module which reads my global config values and then defines a connection function, then I start writing functions in that module which perform the various tasks. For example, at the moment, my boto wrapper module is named awsbox and it consists of functions like getConnection and glacierUpload. Here's a brief example:
import config,sys,os
import boto,uuid
_awsConfig = config.get()['aws']
def getGlacierConnection():
return boto.connect_glacier( aws_access_key_id=_awsConfig['access_key_id'],
aws_secret_access_key=_awsConfig['secret_access_key'])
def glacierUpload( filePath ):
if not os.path.isfile( filePath ):
return False
awsConnect = getGlacierConnection()
vault = awsConnect.get_vault( _awsConfig['vault'] )
vault.upload_archive( filePath )
return True
My question is, should I be writing these "wrapper" modules? Is this the Pythonic way to consume these third-party modules? This method makes sense to me but I wonder if creating these interfaces makes my code less portable or modular, or whether or not there is a better way to integrate these various disparate modules into my main script structure.
You're using the modules as intented. You import them and then use them. As I see it, awsbox is the module that holds the implementation of the functions that match your needs.
So answering your cuestion:
should I be writing these "wrapper" modules?, yes (you can stop calling them "wrappers"), an error would be to rewrite those installed modules.
Is this the Pythonic way to consume these third-party modules?, Is the Python way. Authors write modules for you tu use(import).
How does one get (finds the location of) the dynamically imported modules from a python script ?
so, python from my understanding can dynamically (at run time) load modules.
Be it using _import_(module_name), or using the exec "from x import y", either using imp.find_module("module_name") and then imp.load_module(param1, param2, param3, param4) .
Knowing that I want to get all the dependencies for a python file. This would include getting (or at least I tried to) the dynamically loaded modules, those loaded either by using hard coded string objects or those returned by a function/method.
For normal import module_name and from x import y you can do either a manual scanning of the code or use module_finder.
So if I want to copy one python script and all its dependencies (including the custom dynamically loaded modules) how should I do that ?
You can't; the very nature of programming (in any language) means that you cannot predict what code will be executed without actually executing it. So you have no way of telling which modules could be included.
This is further confused by user-input, consider: __import__(sys.argv[1]).
There's a lot of theoretical information about the first problem, which is normally described as the Halting problem, the second just obviously can't be done.
From a theoretical perspective, you can never know exactly what/where modules are being imported. From a practical perspective, if you simply want to know where the modules are, check the module.__file__ attribute or run the script under python -v to find files when modules are loaded. This won't give you every module that could possibly be loaded, but will get most modules with mostly sane code.
See also: How do I find the location of Python module sources?
This is not possible to do 100% accurately. I answered a similar question here: Dependency Testing with Python
Just an idea and I'm not sure that it will work:
You could write a module that contains a wrapper for __builtin__.__import__. This wrapper would save a reference to the old __import__and then assign a function to __builtin__.__import__ that does the following:
whenever called, get the current stacktrace and work out the calling function. Maybe the information in the globals parameter to __import__ is enough.
get the module of that calling functions and store the name of this module and what will get imported
redirect the call the real __import__
After you have done this you can call your application with python -m magic_module yourapp.py. The magic module must store the information somewhere where you can retrieve it later.
That's quite of a question.
Static analysis is about predicting all possible run-time execution paths and making sure the program halts for specific input at all.
Which is equivalent to Halting Problem and unfortunately there is no generic solution.
The only way to resolve dynamic dependencies is to run the code.