How to omit methods in cProfile - python

When I profile code which heavily uses the all/any builtins I find that the call graph as produced by profiling tools (like gprof2dot) can be more difficult to interpret, because there are many edges from different callers leading to the all/any node, and many edges leading away. I'm looking for a way of essentially omitting the all/any nodes from the call graph such that in this very simplified example the two code paths would not converge and then split.
import cProfile
import random
import time
def bad_true1(*args, **kwargs):
time.sleep(random.random() / 100)
return True
def bad_true2(*args, **kwargs):
time.sleep(random.random() / 100)
return True
def code_path_one():
nums = [random.random() for _ in range(100)]
return all(bad_true1(x) for x in nums)
def code_path_two():
nums = [random.random() for _ in range(100)]
return all(bad_true2(x) for x in nums)
def do_the_thing():
code_path_one()
code_path_two()
def main():
profile = OmittingProfile()
profile.enable()
do_the_thing()
profile.disable()
profile.dump_stats("foo.prof")
if "__main__" == __name__:
main()

I don't think cProfile provides a way to filter out functions while collecting. You can probably manually filter functions out after you got the stats, but that's more than you want to do.
Also, in my experience, as long as there are a lot of nested calls, cProfile will only be helpful to find the "most time consuming function", and that's it, no extra context information, as cProfile only logs its parent function, not the whole call stack. For complicated programs, that may not be super helpful.
From the reasons above, I would recommend you to try some other profiling tools. For example, viztracer. VizTracer draws out your entire program execution so you know what happens in your program. And it happens to have the ability to filter out builtin functions.
pip install viztracer
# --ignore_c_function is the optional filter
viztracer --ignore_c_function your_program.py
vizviewer result.json
Of course, there are other statistical profilers that provide flamegraph, which contains less information, but also introduces less overhead, like pyspy, scalene.
cProfile is a good tool for some simple profiling, but it's definitely not the best tool in the market.

Related

How can i accept and run user's code securely on my web app?

I am working on a django based web app that takes python file as input which contains some function, then in backend i have some lists that are passed as parameters through the user's function,which will generate a single value output.The result generated will be used for some further computation.
Here is how the function inside the user's file look like :
def somefunctionname(list):
''' some computation performed on list'''
return float value
At present the approach that i am using is taking user's file as normal file input. Then in my views.py i am executing the file as module and passing the parameters with eval function. Snippet is given below.
Here modulename is the python file name that i had taken from user and importing as module
exec("import "+modulename)
result = eval(f"{modulename}.{somefunctionname}(arguments)")
Which is working absolutely fine. But i know this is not the secured approach.
My question , Is there any other way through which i can run users file securely as the method that i am using is not secure ? I know the proposed solutions can't be full proof but what are the other ways in which i can run this (like if it can be solved with dockerization then what will be the approach or some external tools that i can use with API )?
Or if possible can somebody tell me how can i simply sandbox this or any tutorial that can help me..?
Any reference or resource will be helpful.
It is an important question. In python sandboxing is not trivial.
It is one of the few cases where the question which version of python interpreter you are using. For example, Jyton generates Java bytecode, and JVM has its own mechanism to run code securely.
For CPython, the default interpreter, originally there were some attempts to make a restricted execution mode, that were abandoned long time ago.
Currently, there is that unofficial project, RestrictedPython that might give you what you need. It is not a full sandbox, i.e. will not give you restricted filesystem access or something, but for you needs it may be just enough.
Basically the guys there just rewrote the python compilation in a more restricted way.
What it allows to do is to compile a piece of code and then execute, all in a restricted mode. For example:
from RestrictedPython import safe_builtins, compile_restricted
source_code = """
print('Hello world, but secure')
"""
byte_code = compile_restricted(
source_code,
filename='<string>',
mode='exec'
)
exec(byte_code, {__builtins__ = safe_builtins})
>>> Hello world, but secure
Running with builtins = safe_builtins disables the dangerous functions like open file, import or whatever. There are also other variations of builtins and other options, take some time to read the docs, they are pretty good.
EDIT:
Here is an example for you use case
from RestrictedPython import safe_builtins, compile_restricted
from RestrictedPython.Eval import default_guarded_getitem
def execute_user_code(user_code, user_func, *args, **kwargs):
""" Executed user code in restricted env
Args:
user_code(str) - String containing the unsafe code
user_func(str) - Function inside user_code to execute and return value
*args, **kwargs - arguments passed to the user function
Return:
Return value of the user_func
"""
def _apply(f, *a, **kw):
return f(*a, **kw)
try:
# This is the variables we allow user code to see. #result will contain return value.
restricted_locals = {
"result": None,
"args": args,
"kwargs": kwargs,
}
# If you want the user to be able to use some of your functions inside his code,
# you should add this function to this dictionary.
# By default many standard actions are disabled. Here I add _apply_ to be able to access
# args and kwargs and _getitem_ to be able to use arrays. Just think before you add
# something else. I am not saying you shouldn't do it. You should understand what you
# are doing thats all.
restricted_globals = {
"__builtins__": safe_builtins,
"_getitem_": default_guarded_getitem,
"_apply_": _apply,
}
# Add another line to user code that executes #user_func
user_code += "\nresult = {0}(*args, **kwargs)".format(user_func)
# Compile the user code
byte_code = compile_restricted(user_code, filename="<user_code>", mode="exec")
# Run it
exec(byte_code, restricted_globals, restricted_locals)
# User code has modified result inside restricted_locals. Return it.
return restricted_locals["result"]
except SyntaxError as e:
# Do whaever you want if the user has code that does not compile
raise
except Exception as e:
# The code did something that is not allowed. Add some nasty punishment to the user here.
raise
Now you have a function execute_user_code, that receives some unsafe code as a string, a name of a function from this code, arguments, and returns the return value of the function with the given arguments.
Here is a very stupid example of some user code:
example = """
def test(x, name="Johny"):
return name + " likes " + str(x*x)
"""
# Lets see how this works
print(execute_user_code(example, "test", 5))
# Result: Johny likes 25
But here is what happens when the user code tries to do something unsafe:
malicious_example = """
import sys
print("Now I have the access to your system, muhahahaha")
"""
# Lets see how this works
print(execute_user_code(malicious_example, "test", 5))
# Result - evil plan failed:
# Traceback (most recent call last):
# File "restr.py", line 69, in <module>
# print(execute_user_code(malitious_example, "test", 5))
# File "restr.py", line 45, in execute_user_code
# exec(byte_code, restricted_globals, restricted_locals)
# File "<user_code>", line 2, in <module>
#ImportError: __import__ not found
Possible extension:
Pay attention that the user code is compiled on each call to the function. However, it is possible that you would like to compile the user code once, then execute it with different parameters. So all you have to do is to save the byte_code somewhere, then to call exec with a different set of restricted_locals each time.
EDIT2:
If you want to use import, you can write your own import function that allows to use only modules that you consider safe. Example:
def _import(name, globals=None, locals=None, fromlist=(), level=0):
safe_modules = ["math"]
if name in safe_modules:
globals[name] = __import__(name, globals, locals, fromlist, level)
else:
raise Exception("Don't you even think about it {0}".format(name))
safe_builtins['__import__'] = _import # Must be a part of builtins
restricted_globals = {
"__builtins__": safe_builtins,
"_getitem_": default_guarded_getitem,
"_apply_": _apply,
}
....
i_example = """
import math
def myceil(x):
return math.ceil(x)
"""
print(execute_user_code(i_example, "myceil", 1.5))
Note that this sample import function is VERY primitive, it will not work with stuff like from x import y. You can look here for a more complex implementation.
EDIT3
Note, that lots of python built in functionality is not available out of the box in RestrictedPython, it does not mean it is not available at all. You may need to implement some function for it to become available.
Even some obvious things like sum or += operator are not obvious in the restricted environment.
For example, the for loop uses _getiter_ function that you must implement and provide yourself (in globals). Since you want to avoid infinite loops, you may want to put some limits on the number of iterations allowed. Here is a sample implementation that limits number of iterations to 100:
MAX_ITER_LEN = 100
class MaxCountIter:
def __init__(self, dataset, max_count):
self.i = iter(dataset)
self.left = max_count
def __iter__(self):
return self
def __next__(self):
if self.left > 0:
self.left -= 1
return next(self.i)
else:
raise StopIteration()
def _getiter(ob):
return MaxCountIter(ob, MAX_ITER_LEN)
....
restricted_globals = {
"_getiter_": _getiter,
....
for_ex = """
def sum(x):
y = 0
for i in range(x):
y = y + i
return y
"""
print(execute_user_code(for_ex, "sum", 6))
If you don't want to limit loop count, just use identity function as _getiter_:
restricted_globals = {
"_getiter_": labmda x: x,
Note that simply limiting the loop count does not guarantee security. First, loops can be nested. Second, you cannot limit the execution count of a while loop. To make it secure, you have to execute unsafe code under some timeout.
Please take a moment to read the docs.
Note that not everything is documented (although many things are). You have to learn to read the project's source code for more advanced things. Best way to learn is to try and run some code, and to see what kind function is missing, then to see the source code of the project to understand how to implement it.
EDIT4
There is still another problem - restricted code may have infinite loops. To avoid it, some kind of timeout is required on the code.
Unfortunately, since you are using django, that is multi threaded unless you explicitly specify otherwise, simple trick for timeouts using signeals will not work here, you have to use multiprocessing.
Easiest way in my opinion - use this library. Simply add a decorator to execute_user_code so it will look like this:
#timeout_decorator.timeout(5, use_signals=False)
def execute_user_code(user_code, user_func, *args, **kwargs):
And you are done. The code will never run more than 5 seconds.
Pay attention to use_signals=False, without this it may have some unexpected behavior in django.
Also note that this is relatively heavy on resources (and I don't really see a way to overcome this). I mean not really crazy heavy, but it is an extra process spawn. You should hold that in mind in your web server configuration - the api which allows to execute arbitrary user code is more vulnerable to ddos.
For sure with docker you can sandbox the execution if you are careful. You can restrict CPU cycles, max memory, close all network ports, run as a user with read only access to the file system and all).
Still,this would be extremely complex to get it right I think. For me you shall not allow a client to execute arbitrar code like that.
I would be to check if a production/solution isn't already done and use that. I was thinking that some sites allow you to submit some code (python, java, whatever) that is executed on the server.

Optimizing a multithreaded numpy array function

Given 2 large arrays of 3D points (I'll call the first "source", and the second "destination"), I needed a function that would return indices from "destination" which matched elements of "source" as its closest, with this limitation: I can only use numpy... So no scipy, pandas, numexpr, cython...
To do this i wrote a function based on the "brute force" answer to this question. I iterate over elements of source, find the closest element from destination and return its index. Due to performance concerns, and again because i can only use numpy, I tried multithreading to speed it up. Here are both threaded and unthreaded functions and how they compare in speed on an 8 core machine.
import timeit
import numpy as np
from numpy.core.umath_tests import inner1d
from multiprocessing.pool import ThreadPool
def threaded(sources, destinations):
# Define worker function
def worker(point):
dlt = (destinations-point) # delta between destinations and given point
d = inner1d(dlt,dlt) # get distances
return np.argmin(d) # return closest index
# Multithread!
p = ThreadPool()
return p.map(worker, sources)
def unthreaded(sources, destinations):
results = []
#for p in sources:
for i in range(len(sources)):
dlt = (destinations-sources[i]) # difference between destinations and given point
d = inner1d(dlt,dlt) # get distances
results.append(np.argmin(d)) # append closest index
return results
# Setup the data
n_destinations = 10000 # 10k random destinations
n_sources = 10000 # 10k random sources
destinations= np.random.rand(n_destinations,3) * 100
sources = np.random.rand(n_sources,3) * 100
#Compare!
print 'threaded: %s'%timeit.Timer(lambda: threaded(sources,destinations)).repeat(1,1)[0]
print 'unthreaded: %s'%timeit.Timer(lambda: unthreaded(sources,destinations)).repeat(1,1)[0]
Retults:
threaded: 0.894030461056
unthreaded: 1.97295164054
Multithreading seems beneficial but I was hoping for more than 2X increase given the real life dataset i deal with are much larger.
All recommendations to improve performance (within the limitations described above) will be greatly appreciated!
Ok, I've been reading Maya documentation on python and I came to these conclusions/guesses:
They're probably using CPython inside (several references to that documentation and not any other).
They're not fond of threads (lots of non-thread safe methods)
Since the above, I'd say it's better to avoid threads. Because of the GIL problem, this is a common problem and there are several ways to do the earlier.
Try to build a tool C/C++ extension. Once that is done, use threads in C/C++. Personally, I'd only try SIP to work, and then move on.
Use multiprocessing. Even if your custom python distribution doesn't include it, you can get to a working version since it's all pure python code. multiprocessing is not affected by the GIL since it spawns separate processes.
The above should've worked out for you. If not, try another parallel tool (after some serious praying).
On a side note, if you're using outside modules, be most mindful of trying to match maya's version. This may have been the reason because you couldn't build scipy. Of course, scipy has a huge codebase and the windows platform is not the most resilient to build stuff.

multiple cpu usage when accessing data attached to traited classes

I have an application that uses a number of classes inheriting from HasTraits. Some of these classes manage access to data and others provide functions for analyzing that data. This works wonderfully for a gui -- I can check that the data and analysis code is doing what it should. However, I've noticed that when I use these classes for gui-less computations, all the cpus on the system end up getting used.
Here is a small example that shows the cpu usage:
from traits.api import HasTraits, List, Int, Enum, Instance
import numpy as np
import psutil
from itertools import combinations
"""
Small example of high CPU usage by traited classes
"""
class DataStorage(HasTraits):
nsamples = Int(2000)
samples = List
def _samples_default(self):
return np.random.randn(self.nsamples,2000).tolist()
def sample_samples(self,indices):
""" return a 2D array of data at indices """
return np.array(
[self.samples[i] for i in indices])
class DataAccessor(HasTraits):
""" Class that grabs data and computes something """
measure = Enum("correlation","covariance")
data_source = Instance(DataStorage,())
def compute_measure(self,indices):
""" example of some computation """
samples = self.data_source.sample_samples(indices)
percentage = psutil.cpu_percent(interval=0, percpu=True)
if self.measure == "correlation":
result = np.corrcoef(samples)
elif self.measure == "covariance":
result = np.cov(samples)
return percentage
# Run a simulation to see cpu usage
analyzer = DataAccessor()
usage = []
n_iterations = 0
max_iterations = 500
for combo in combinations(np.arange(2000),500):
# evaluate the measurement on a subset of the data
usage.append(analyzer.compute_measure(combo))
n_iterations += 1
if n_iterations > max_iterations:
break
print n_iterations
use_percents = np.array(usage).T
When I run this on an 8-cpu machine running CentOS, top reports the python process at roughly 600%.
>>> use_percents.mean(1)
shows
array([ 67.05548902, 67.06906188, 66.89041916, 67.28942116,
66.69421158, 67.61437126, 99.8007984 , 67.31996008])
Question:
My computation is embarrassingly parallel, so it would be great to have the other cpus available to split up the job. Does anyone know what's happening here? A plain python version of this uses 100% on a single cpu.
Is there a way to keep everything local to a single cpu without rewriting all my classes without traits?
Traits is not causing the CPU usage. It's easy to rewrite this bit of code without Traits, and you will see that you get the same pattern of CPU usage (at least, I do).
Instead, what you are probably seeing is the CPU usage of the BLAS library that your build of numpy is linked against. numpy.corrcoeff() calls numpy.cov(), and much of the computation of numpy.cov() is taken up by a numpy.dot() call, which does a matrix-matrix multiplication using BLAS. If it is an optimized BLAS library, then it will usually use non-Python threads internally to split up these computations among your CPUs. You will have to consult the documentation of your optimized BLAS library to find out how to change this.

Parallel mapping functions in IPython w/ multiple parameters

I'm trying to use IPython's parallel environment and so far, it's looking great but I'm running into a problem. Lets say that I have a function, defined in a library
def func(a,b):
...
that I use when I want to evaluate on one value of a and a bunch of values of b.
[func(myA, b) for b in myLongList]
Obviously, the real function is more complicated but the essence of the matter is that it takes multiple parameters and I'd like to map over only one of them. The problem is that map, #dview.parallel, etc. map over all the arguments.
So lets say I want to get the answer to func(myA, myLongList). The obvious way to do this is to curry, either w/ functools.partial or just as
dview.map_sync(lambda b: func(myA, b), myLongList)
However, this does not work correctly on remote machines. The reason is that when the lambda expression is pickled, the value of myA is not included and instead, the value of myA from the local scope on the remote machine is used. When closures get pickled, the variables they close over don't.
Two ways I can think of doing this that will actually work are to manually construct lists for every argument and have map work over all of the arguments,
dview.map_sync(func, [myA]*len(myLongList), myLongList)
or to horrifically use the data as default arguments to a function, forcing it to get pickled:
# Can't use a lambda here b/c lambdas don't use default arguments :(
def parallelFunc(b, myA = myA):
return func(myA, b)
dview.map_sync(parallelFunc, myLongList)
Really, this all seems horribly contorted when the real function takes a lot of parameters and is more complicated. Is there some idiomatic way of doing this? Something like
#parallel(mapOver='b')
def bigLongFn(a, b):
...
but as far as I know, nothing like the 'mapOver' thing exists. I probably have an idea of how to implement it ... this just feels like a very basic operation that there should exist support for so I want to check if I'm missing something.
I can improve a bit on batu's answer (which I think is a good one, but doesn't perhaps document in as much detail WHY you use those options). The ipython documentation is also currently woefully inadequate on this point. So your function is of the form:
def myfxn(a,b,c,d):
....
return z
and stored in a file called mylib. Lets say b,c, and d are the same during your run, so you write a lambda function to reduce it to a 1-parameter function.
import mylib
mylamfxn=lambda a:mylib.myfxn(a,b,c,d)
and you want to run:
z=dview.map_sync(mylamfxn, iterable_of_a)
In a dream world, everything would magically work like that. However, first you'd get an error of "mylib not found," because the ipcluster processes haven't loaded mylib. Make sure the ipcluster processes have "mylib" in their python path and are in the correct working directory for myfxn, if necessary. Then you need to add to your python code:
dview.execute('import mylib')
which runs the import mylib command on each process. If you try again, you'll get an error along the lines of "global variable b not defined" because while the variables are in your python session, they aren't in the ipcluster processes. However, python provides a method of copying a group of variables to the subprocesses. Continuing the example above:
mydict=dict(b=b, c=c, d=d)
dview.push(mydict)
Now all of the subprocesses have access to b,c,and d. Then you can just run:
z=dview.map_sync(mylamfxn, iterable_of_a)
and it should now work as advertised. Anyway, I'm new to parallel computing with python, and found this thread useful, so I thought I'd try to help explain a few of the points that confused me a bit....
The final code would be:
import mylib
#set up parallel processes, start ipcluster from command line prior!
from IPython.parallel import Client
rc=Client()
dview=rc[:]
#...do stuff to get iterable_of_a and b,c,d....
mylamfxn=lambda a:mylib.myfxn(a,b,c,d)
dview.execute('import mylib')
mydict=dict(b=b, c=c, d=d)
dview.push(mydict)
z=dview.map_sync(mylamfxn, iterable_of_a)
This is probably the quickest and easiest way to make pretty much any embarrassingly parallel code run parallel in python....
UPDATE You can also use dview to push all the data without loops and then use an lview (i.e. lview=rc.load_balanced_view(); lview.map(...) to do the actual calculation in load balanced fashion.
This is my first message to StackOverflow so please be gentle ;) I was trying to do the same thing, and came up with the following. I am pretty sure this is not the most efficient way, but seems to work somewhat. One caveat for now is that for some reason I only see two engines working at 100%, the others are sitting almost idle...
In order to call a multiple arg function in map I first wrote this routine in my personal parallel.py module:
def map(r,func, args=None, modules=None):
"""
Before you run parallel.map, start your cluster (e.g. ipcluster start -n 4)
map(r,func, args=None, modules=None):
args=dict(arg0=arg0,...)
modules='numpy, scipy'
examples:
func= lambda x: numpy.random.rand()**2.
z=parallel.map(r_[0:1000], func, modules='numpy, numpy.random')
plot(z)
A=ones((1000,1000));
l=range(0,1000)
func=lambda x : A[x,l]**2.
z=parallel.map(r_[0:1000], func, dict(A=A, l=l))
z=array(z)
"""
from IPython.parallel import Client
mec = Client()
mec.clear()
lview=mec.load_balanced_view()
for k in mec.ids:
mec[k].activate()
if args is not None:
mec[k].push(args)
if modules is not None:
mec[k].execute('import '+modules)
z=lview.map(func, r)
out=z.get()
return out
As you can see the function takes an args parameter which is a dict of parameters in the head nodes workspace. These parameters are then pushed to the engines. At that point they become local objects and can be used in the function directly. For example in the last example given above in comments, the A matrix is sliced using the l engine-local variable.
I must say that even though the above function works, I am not 100% happy with it at the moment. If I can come up with something better, I will post it here.
UPDATE:2013/04/11
I made minor changes to the code:
- The activate statement was missing brackets. Causing it not to run.
- Moved mec.clear() to the top of the function, as opposed to the end.
I also noticed that it works best if I run it within ipython. For example, I may get errors if I run a script using the above function as "python ./myparallelrun.py" but not if I run it within ipython using "%run ./myparallelrun.py". Not sure why...
An elegant way to do this is with partial functions.
If you know that you want the first argument of foo to be myArg, you can create a new function bar by
from functools import partial
bar = partial(foo, myARg)
bar(otherArg) will then return foo(myArg,otherArg)
let's build on that:
dview.map_sync(func, [myA]*len(myLongList), myLongList)
maybe the following would work:
from itertools import izip_longest
dview.map_sync(func, izip_longest(myLongList, [], fillvalue=myA))
example:
>>> # notice that a is a tuple
... concat = lambda a: '%s %s' % a
>>> mylonglist = range(10)
>>> from itertools import izip_longest
>>> map(concat, izip_longest(mylonglist, [], fillvalue='mississippi'))
['0 mississippi', '1 mississippi', '2 mississippi', '3 mississippi',
'4 mississippi', '5 mississippi', '6 mississippi', '7 mississippi',
'8 mississippi', '9 mississippi']
I am posting Alex S. comment as answer. This probably is right approach for this problem:
Just do partial application with a lambda. I know it looks weird, but using my_f = lambda a,my=other,arguments=go,right=here : f(a,my,arguments,right) is the simplest way to go about it without falling into pickling and pushing problems.

What is the easiest way to generate a Control Flow-Graph for a method in Python?

I am writing a program that tries to compare two methods. I would like to generate Control flow graphs (CFG) for all matched methods and use either a topological sort to compare the two graphs.
RPython, the translation toolchain behind PyPy, offers a way of grabbing the flow graph (in the pypy/rpython/flowspace directory of the PyPy project) for type inference.
This works quite well in most cases but generators are not supported. The result will be in SSA form, which might be good or bad, depending on what you want.
There's a Python package called staticfg which does exactly the this -- generation of control flow graphs from a piece of Python code.
For instance, putting the first quick sort Python snippet from Rosseta Code in qsort.py, the following code generates its control flow graph.
from staticfg import CFGBuilder
cfg = CFGBuilder().build_from_file('quick sort', 'qsort.py')
cfg.build_visual('qsort', 'png')
Note that it doesn't seem to understand more advanced control flow like comprehensions.
I found py2cfg has a better representation of Control Flow Graph (CFG) than one from staticfg.
https://gitlab.com/classroomcode/py2cfg
https://pypi.org/project/py2cfg/
Let's take this function in Python:
def fib():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
fib_gen = fib()
for _ in range(10):
next(fib_gen)
Image from StaticCFG:
Image from PY2CFG:
http://pycallgraph.slowchop.com/ looks like what you need.
Python trace module also have option --trackcalls that can be an entrypoint for call tracing machinery in stdlib.

Categories