Julia's PyCall package generates Segmentation fault - python

I'm currently using PyCall to load a Python library for data compression based on LZ-77 into Julia. The python library is sweetsourcod and I have it installed in my home directory. Within that library, I am using the module lempel_ziv for some entropy measurements. I load the python module following PyCall's example. This is how I'm loading it into Julia:
using PyCall
sc = pyimport("sweetsourcod.lempel_ziv")
PyObject <module 'sweetsourcod.lempel_ziv' from '/Users/danielribeiro/sweetsourcod/sweetsourcod/lempel_ziv.cpython-38-darwin.so'>
The use of this python library seems to be causing a segmentation fault within Julia; however, when I write the same code in python, the segmentation fault does not take place. The following Julia example triggers the segmentation fault
using PyCall
L = 1000000
nbins = [2*i for i = 1:2:15]
sc = pyimport("sweetsourcod.lempel_ziv")
# loop through all n
for n in nbins
# loop through all configurations
for i = 1:65
# analogous to reading a configuration from scratch
config = rand(0:255, L)
# calculate entropy
# 1.1300458785794484e6 --> cid of random sequence of same L
entropy = sc.lempel_ziv_complexity(config, "lz77")[2] / 1.1300458785794484e6
end
end
the line entropy = sc.lempel_ziv_complexity(config, "lz77")[2] / 1.1300458785794484e6 is what triggers the segfault. This the minimal working example I was able to write in Julia to generate the segfault. The function lempel_ziv_complexity() compresses the array and returns a tuple with the LZ factors and the approximate size of the compressed file.
When I write identical code in Python, the segfault is not triggered. This is the working example in Python
import numpy as np
from sweetsourcod.lempel_ziv import lempel_ziv_complexity
L = 1000000
nbins = [2*i for i in range(1, 15, 2)]
for n in nbins:
for i in range(1, 65, 1):
config = np.random.randint(0, 256, L)
entropy = lempel_ziv_complexity(config, "lz77")[1] / 1.1300458785794484e6
I suspect the triggering of the segfault has to do with PyCall's internals, with which I am unfamiliar with. I have also tried precompiling sweetsourcod into a module like it is suggested in PyCall's README. Does anyone have any suggestions on how to address this issue?
Thank you in advance!

Related

PyCuda - How can I use functions written in Python in the kernel?

I want to parallelize my Python code and I'm trying to use PyCuda.
What I saw so far is that you have to write a "Kernel" in C into your Python code. This Kernel is what is going to be parallelized. Am I right?
Example (doubling an array of random numbers, from https://documen.tician.de/pycuda/tutorial.html):
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
a = numpy.random.randn(4, 4)
a = a.astype(numpy.float32)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
# Kernel:
mod = SourceModule("""
__global__ void doublify(float *a)
{
int idx = threadIdx.x + threadIdx.y*4;
a[idx] *= 2;
}
""")
func = mod.get_function("doublify")
func(a_gpu, block=(4, 4, 1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)
The point is that my Python code has classes and other things all suitable with Python and unsuitable with C (i.e. untranslatable to C).
Let me clarify: my has 256 independent for-loops that I want to parallelize. These loops contain Python code that can’t be translated to C.
How can I parallelize an actual Python code with PyCuda without translating my code to C?
You can't.
PyCUDA doesn't support device side python, all device code must be written in the CUDA C dialect.
Numba includes a direct Python compiler which can allow an extremely limited subset of Python language features to be compiled and run directly on the GPU. This does not include access to any Python libraries such as numpy, scipy, etc.

Anaconda package for cufft keeping arrays in gpu memory between fft / ifft calls

I am using the anaconda suite with ipython 3.6.1 and their accelerate package. There is a cufft sub-package in this two functions fft and ifft. These, as far as I understand, takes in a numpy array and outputs to a numpy array, both in system ram, i.e. all gpu-memory and transfer between system and gpu memory is handled automatically and gpu memory is releaseed as function is ended. This seems all very nice and seems to work for me. However, I would like to run multiple fft/ifft calls on the same array and for each time extract just one number from the array. It would be nice to keep the array in the gpu memory to minimize system <-> gpu transfer. Am I correct that this is not possible using this package? If so, is there another package that would do the same. I have noticed the reikna project but that doesn't seem available in anaconda.
The thing I am doing (and would like to do efficiently on gpu) is in short shown here using numpy.fft
import math as m
import numpy as np
import numpy.fft as dft
nr = 100
nh = 2**16
h = np.random.rand(nh)*1j
H = np.zeros(nh,dtype='complex64')
h[10] = 1
r = np.zeros(nr,dtype='complex64')
fftscale = m.sqrt(nh)
corr = 0.12j
for i in np.arange(nr):
r[i] = h[10]
H = dft.fft(h,nh)/fftscale
h = dft.ifft(h*corr)*fftscale
r[nr-1] = h[10]
print(r)
Thanks in advance!
So I found Arrayfire which seems rather easy to work with.

How do you import a Python library within an R package using rPython?

The basic question is this: Let's say I was writing R functions which called python via rPython, and I want to integrate this into a package. That's simple---it's irrelevant that the R function wraps around Python, and you proceed as usual. e.g.
# trivial example
# library(rPython)
add <- function(x, y) {
python.assign("x", x)
python.assign("y", y)
python.exec("result = x+y")
result <- python.get("result")
return(result)
}
But what if the python code with R functions require users to import Python libraries first? e.g.
# python code, not R
import numpy as np
print(np.sin(np.deg2rad(90)))
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
If you run this without importing the library numpy, you will get an error.
How do you import a Python library in an R package? How do you comment it with roxygen2?
It appears the R standard is this:
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('import numpy as np')
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
Each time you run an R function, you will import an entire Python library.
As #Spacedman and #DirkEddelbuettel suggest you could add a .onLoad/.onAttach function to your package that calls python.exec to import the modules that will typically always be required by users of your package.
You could also test whether the module has already been imported before importing it, but (a) that gets you into a bit of a regression problem because you need to import sys in order to perform the test, (b) the answers to that question suggest that at least in terms of performance, it shouldn't matter, e.g.
If you want to optimize by not importing things twice, save yourself the hassle because Python already takes care of this.
(although admittedly there is some quibblingdiscussion elsewhere on that page about possible scenarios where there could be a performance cost).
But maybe your concern is stylistic rather than performance-oriented ...

Designing a simple Binomial distribution throws core dump in pymc

I am trying to design a simple binomial distribution in pymc. However it fails with the below error, the same code works fine if I use Poisson distribution instead of binomial
import pymc as pm
from pymc import Beta,Binomial,Exponential
import numpy as np
from pymc.Matplot import plot as mcplot
data = pm.rbinomial(5,0.01,size=100)
p = Beta("p",1,1)
observations = Binomial("obs",5,p,value=data,observed=True)
model = pm.Model([p,observations])
mcmc = pm.MCMC(model)
mcmc.sample(400,100,2)
mcplot(mcmc)
Error
venki#venki-HP-248-G1-Notebook-PC:~/Desktop$ python perf_testing.py
*** glibc detected *** python: free(): corrupted unsorted chunks: 0x0000000003cb0d40 ***
*** glibc detected *** python: malloc(): memory corruption: 0x00000000038bf2e0 ***
I have also created a issue in github pymc. I am though not sure, If i am wrong or is it a bug ?
OS
Python 2.7.3
pymc 2.3.4
Ubuntu 12.04.5 LTS
I think that this is a bug (here is a link to the issue you opened, thanks!).
Here is a work around you can use for now: instead of the creating observations as you have done above, use n and p arguments which have dimension matching data:
observations = Binomial("obs", 5*np.ones_like(data),
p*np.ones_like(data), value=data,observed=True)

Jython: ImportError: No module named multiarray

When I try to call file and its method using Jython it shows the following error, while my Numpy, Python and NLTK is correctly installed and it works properly if I directly run directly from the Python shell
File "C:\Python26\Lib\site-packages\numpy\core\__init__.py", line 5, in <module>
import multiarray
ImportError: No module named multiarray
The code that I am using is simple one:
PyInstance hello = ie.createClass("PreProcessing", "None");
PyString str = new PyString("my name is abcd");
PyObject po = hello.invoke("preprocess", str);
System.out.println(po);
When I run only the file of python containing class PreProcessing and calling method preprocess it works fine, but with Jython it throws error.
Jython is unable to import all the libraries that have only compiled version kept in the folder not the class code itself. Like instead of multiarray.py it only has multiarray.pyd that is the compiled version so it is not getting detected in Jython.
Why is it showing this behaviour? How to resolve it?
Please help!
I know this is an old thread, but I recently ran into this same problem and was able to solve it and I figure the solution should be here in case anyone in the future runs into it. Like said above, Jython cannot deal with numpy's pre-compiled c files, but within nltk, the use of numpy is very limited and it's fairly straightforward to rewrite the affected bits of code. That's what I did, and I'm sure it's not the most computationally effective solution, but it works. This code is found in nltk.metrics.Segmentation, and I will only paste relevant code, but it will still be a little much.
def _init_mat(nrows, ncols, ins_cost, del_cost):
mat = [[4.97232652e-299 for x in xrange(ncols)] for x in xrange(nrows)]
for x in range(0,ncols):
mat[0][x] = x * ins_cost
for x in range(0, nrows):
mat[x][0] = x * del_cost
return mat
def _ghd_aux(mat, rowv, colv, ins_cost, del_cost, shift_cost_coeff):
for i, rowi in enumerate(rowv):
for j, colj in enumerate(colv):
shift_cost = shift_cost_coeff * abs(rowi - colj) + mat[i][j]
if rowi == colj:
# boundaries are at the same location, no transformation required
tcost = mat[i][j]
elif rowi > colj:
# boundary match through a deletion
tcost = del_cost + mat[i][j + 1]
else:
# boundary match through an insertion
tcost = ins_cost + mat[i + 1][j]
mat[i + 1][j + 1] = min(tcost, shift_cost)
Also at the end of ghd, change the return statement to
return mat[-1][-1]
I hope this helps someone! I don't know if there are other places where this is any issue, but this is the only one that I have encountered. If there are any other issues of this sort they can be solved in the same way(using a list of lists instead of a numpy array), again, you probably lose some efficiency, but it works.
jython is Java. Parts of Numpy are implemented as c extensions to Python (.pyd files). Some parts are implemented as .py files, which will work just fine in Jython. However, they cannot function with out access to the C level code. Currently, there is noway to use numpy in jython. See:
Using NumPy and Cpython with Jython
Or
Is there a good NumPy clone for Jython?
For recent discussions on alternatives.

Categories