I was reading the wiki page for Numba, and it says Numba is a "compiler". But then later on, it says that to use Numba, you import it like a package. I later looked up how to use Numba, and indeed, you just pip install it.
So now I am confused. I thought Numba was a compiler? But it seems to be used just like any other package, like numpy or pandas? What's the difference?
A compiler is a program that inputs something in human-readable form (usually a program in a specified language) and outputs a functionally equivalent stream in another, more machine-digestible form. Just as with any other transformation, it's equally viable as a command-line invocation or a function call.
As long as it's wrapped properly in a package for general use, it's perfect reasonable to deliver a compiler as a Python package.
Does that clear up the difficulty?
From what I have read at Numba documentation it's a package that you import to you project and then use the Numba decorator do indicate parts of your code that you would like to have compiled in JIT (Just in Time) in order to optimize them. Like in the following example:
from numba import jit
import random
#jit(nopython=True)
def monte_carlo_pi(nsamples):
acc = 0
for i in range(nsamples):
x = random.random()
y = random.random()
if (x ** 2 + y ** 2) < 1.0:
acc += 1
return 4.0 * acc / nsamples
When the monte_carlo_pi function is called Numba will have it compiled in order to optimize it, so there isn't a compilation step that you can take.
Related
I was wondering if anyone would have an idea on how I am able to vectorize the following loop:
for i in range(1,(T*n)+1):
Y = Y + np.diag(mu) # Y * dt + np.multiply(np.diag(sigma)#Y, L # np.random.normal( 0, dt, (d,N)))
Whereas the following parameters are already a dxN matrices (I already vectorized a loop with that..):
Y (this is the recursive Parameter)
np.diag(mu) # Y * dt
np.diag(sigma) # Y
L # np.random.normal( 0, dt, (d,N))
Any help would be very appreciated. :)
With best regards!
Unfortunately, this doesn't look like vectorizable code:
Iterations should be independent. Typically, vectorization means making several iterations at once. Typically, it also implies using AVX, SSE or FMA instructions (if we talk about x86 processors) to make iterations go truly in parallel on a hardware level.
Continuing about vector assembly instructions, such level of optimization is typically unreachable from python code because the interpreter isn't that smart. An iteration is also doing too much to be vectorized. It actually contains sub-loops! We don't see it but matrix multiplications do involve more loops.
So I woudn't call optimization of this loop a "vectorization". But luckily, there are still things to check:
Profile it. Find out what part of the computation consumes most of the time.
Verify that np.random doesn't slow down the program significantly. If yes, you can rely on pre-generated values instead.
Check if code that can be vectorized is vectorized. That means, verify that your numpy is built with SSE/AVX support and that matrix multiplications use that under the hood. It can be a bit tricky to do but up to x4 speedups* are possible with AVX usage.
If parts of the code are indeed vectorized on the assembly level, switching to storing data in float16 arrays can make it even faster. To my knowledge, AVX does support operations on large blocks of 16-bit floats.
Rewrite it in C/Cython or try out Numba JIT compilation for the same task.
If compilation even with Numba is not the case, I wonder if Tensorflow can help here. With Tensorflow, Python code doesn't kick off computations immediately but rather constructs a computational graph that is then executed without returning to the interpreter level. Tensorflow does support AVX and SSE (although not without pain), so you may expect more control over low-level details than with numpy. And you can also try to launch it on GPU.
Last thing, I don't quite believe in it, but does loop unrolling help?
for i in range(1, (T * n + 1) // 4):
Y = Y + ...
Y = Y + ...
Y = Y + ...
Y = Y + ...
* - subject to Amdahl's law
The basic question is this: Let's say I was writing R functions which called python via rPython, and I want to integrate this into a package. That's simple---it's irrelevant that the R function wraps around Python, and you proceed as usual. e.g.
# trivial example
# library(rPython)
add <- function(x, y) {
python.assign("x", x)
python.assign("y", y)
python.exec("result = x+y")
result <- python.get("result")
return(result)
}
But what if the python code with R functions require users to import Python libraries first? e.g.
# python code, not R
import numpy as np
print(np.sin(np.deg2rad(90)))
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
If you run this without importing the library numpy, you will get an error.
How do you import a Python library in an R package? How do you comment it with roxygen2?
It appears the R standard is this:
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('import numpy as np')
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
Each time you run an R function, you will import an entire Python library.
As #Spacedman and #DirkEddelbuettel suggest you could add a .onLoad/.onAttach function to your package that calls python.exec to import the modules that will typically always be required by users of your package.
You could also test whether the module has already been imported before importing it, but (a) that gets you into a bit of a regression problem because you need to import sys in order to perform the test, (b) the answers to that question suggest that at least in terms of performance, it shouldn't matter, e.g.
If you want to optimize by not importing things twice, save yourself the hassle because Python already takes care of this.
(although admittedly there is some quibblingdiscussion elsewhere on that page about possible scenarios where there could be a performance cost).
But maybe your concern is stylistic rather than performance-oriented ...
Given 2 large arrays of 3D points (I'll call the first "source", and the second "destination"), I needed a function that would return indices from "destination" which matched elements of "source" as its closest, with this limitation: I can only use numpy... So no scipy, pandas, numexpr, cython...
To do this i wrote a function based on the "brute force" answer to this question. I iterate over elements of source, find the closest element from destination and return its index. Due to performance concerns, and again because i can only use numpy, I tried multithreading to speed it up. Here are both threaded and unthreaded functions and how they compare in speed on an 8 core machine.
import timeit
import numpy as np
from numpy.core.umath_tests import inner1d
from multiprocessing.pool import ThreadPool
def threaded(sources, destinations):
# Define worker function
def worker(point):
dlt = (destinations-point) # delta between destinations and given point
d = inner1d(dlt,dlt) # get distances
return np.argmin(d) # return closest index
# Multithread!
p = ThreadPool()
return p.map(worker, sources)
def unthreaded(sources, destinations):
results = []
#for p in sources:
for i in range(len(sources)):
dlt = (destinations-sources[i]) # difference between destinations and given point
d = inner1d(dlt,dlt) # get distances
results.append(np.argmin(d)) # append closest index
return results
# Setup the data
n_destinations = 10000 # 10k random destinations
n_sources = 10000 # 10k random sources
destinations= np.random.rand(n_destinations,3) * 100
sources = np.random.rand(n_sources,3) * 100
#Compare!
print 'threaded: %s'%timeit.Timer(lambda: threaded(sources,destinations)).repeat(1,1)[0]
print 'unthreaded: %s'%timeit.Timer(lambda: unthreaded(sources,destinations)).repeat(1,1)[0]
Retults:
threaded: 0.894030461056
unthreaded: 1.97295164054
Multithreading seems beneficial but I was hoping for more than 2X increase given the real life dataset i deal with are much larger.
All recommendations to improve performance (within the limitations described above) will be greatly appreciated!
Ok, I've been reading Maya documentation on python and I came to these conclusions/guesses:
They're probably using CPython inside (several references to that documentation and not any other).
They're not fond of threads (lots of non-thread safe methods)
Since the above, I'd say it's better to avoid threads. Because of the GIL problem, this is a common problem and there are several ways to do the earlier.
Try to build a tool C/C++ extension. Once that is done, use threads in C/C++. Personally, I'd only try SIP to work, and then move on.
Use multiprocessing. Even if your custom python distribution doesn't include it, you can get to a working version since it's all pure python code. multiprocessing is not affected by the GIL since it spawns separate processes.
The above should've worked out for you. If not, try another parallel tool (after some serious praying).
On a side note, if you're using outside modules, be most mindful of trying to match maya's version. This may have been the reason because you couldn't build scipy. Of course, scipy has a huge codebase and the windows platform is not the most resilient to build stuff.
If the price charged for a crayon is p cents, then x thousand crayons
will be sold in a certain school store, where p(x)= 122-x/34 .
Using Python, calculate how many crayons must be sold to maximize
revenue.
I can solve this by hand much easily, the only problem is how can I do it using plain Python? I am using IDLE (Python GUI). I am new to Python and haven't downloaded any external libraries. Any help will be greatly appreciated.
What I've done up to this point is
import math
def f(x):
return (122-(x/34.0))
def g(x):
return x*f(x)
def h(x):
return (122-(2*x/34.0))
Use SymPy. It's simple, beautiful and powerful.
You can write down your equations with simpify(), like that:
p = simpify('122 - x/34')
And define symbols for symbolic evaluation with Symbol() and symbols().
With that you can do things like simply use solve() function for any given equation. i.e. x + 4 = 2x:
res = solve('x + 4 - 2*x')
It's pretty much the tool I use for any math work with python.
So, you should go and download an external library for this, as it's not functionality that python makes easy to implement natively. Also, if you're serious about doing mathematical computation in python I would suggest switching operating systems to something like OSX or linux, simply because compiling old FORTRAN libraries (required for much performant mathematical computing) is a huge pain on Windows.
You have to make use of the scipy library here, which has an optimize module. Specifically I would suggest using the optimize.minimize_scalar function. Docs can be found here.
>>> from scipy.optimize import minimize_scalar
>>> def g(x):
... return -(x*(122 - (x/34))) # inverse because you're minimizing.
>>> minimize_scalar(g, bounds=(1, 10000), method='bounded')
status: 0
nfev: 6
success: True
fun: -126514.0
x: 2074.0
message: 'Solution found.'
When I try to call file and its method using Jython it shows the following error, while my Numpy, Python and NLTK is correctly installed and it works properly if I directly run directly from the Python shell
File "C:\Python26\Lib\site-packages\numpy\core\__init__.py", line 5, in <module>
import multiarray
ImportError: No module named multiarray
The code that I am using is simple one:
PyInstance hello = ie.createClass("PreProcessing", "None");
PyString str = new PyString("my name is abcd");
PyObject po = hello.invoke("preprocess", str);
System.out.println(po);
When I run only the file of python containing class PreProcessing and calling method preprocess it works fine, but with Jython it throws error.
Jython is unable to import all the libraries that have only compiled version kept in the folder not the class code itself. Like instead of multiarray.py it only has multiarray.pyd that is the compiled version so it is not getting detected in Jython.
Why is it showing this behaviour? How to resolve it?
Please help!
I know this is an old thread, but I recently ran into this same problem and was able to solve it and I figure the solution should be here in case anyone in the future runs into it. Like said above, Jython cannot deal with numpy's pre-compiled c files, but within nltk, the use of numpy is very limited and it's fairly straightforward to rewrite the affected bits of code. That's what I did, and I'm sure it's not the most computationally effective solution, but it works. This code is found in nltk.metrics.Segmentation, and I will only paste relevant code, but it will still be a little much.
def _init_mat(nrows, ncols, ins_cost, del_cost):
mat = [[4.97232652e-299 for x in xrange(ncols)] for x in xrange(nrows)]
for x in range(0,ncols):
mat[0][x] = x * ins_cost
for x in range(0, nrows):
mat[x][0] = x * del_cost
return mat
def _ghd_aux(mat, rowv, colv, ins_cost, del_cost, shift_cost_coeff):
for i, rowi in enumerate(rowv):
for j, colj in enumerate(colv):
shift_cost = shift_cost_coeff * abs(rowi - colj) + mat[i][j]
if rowi == colj:
# boundaries are at the same location, no transformation required
tcost = mat[i][j]
elif rowi > colj:
# boundary match through a deletion
tcost = del_cost + mat[i][j + 1]
else:
# boundary match through an insertion
tcost = ins_cost + mat[i + 1][j]
mat[i + 1][j + 1] = min(tcost, shift_cost)
Also at the end of ghd, change the return statement to
return mat[-1][-1]
I hope this helps someone! I don't know if there are other places where this is any issue, but this is the only one that I have encountered. If there are any other issues of this sort they can be solved in the same way(using a list of lists instead of a numpy array), again, you probably lose some efficiency, but it works.
jython is Java. Parts of Numpy are implemented as c extensions to Python (.pyd files). Some parts are implemented as .py files, which will work just fine in Jython. However, they cannot function with out access to the C level code. Currently, there is noway to use numpy in jython. See:
Using NumPy and Cpython with Jython
Or
Is there a good NumPy clone for Jython?
For recent discussions on alternatives.