Scipy solve_ivp crashes script and debugger without error message - python

I am trying to solve a differential equation using scipy.integrate.solve_ivp using the "rk45" method. However, directly after the first actual call to my system function f, the script crashes without any exception message or similar, it just stops. The same thing happens when I try to run it using the debugger as python3 -m pdb ./solve.py. I also tried using the trace module as described here. However, that gives me too much information and I don't really see where exactly the error appears. The error strictly appears directly after the system function is called, somewhere in the scipy module.
I currently have not constructed a minimal example to reproduce this, I might add that later. For now, I am wondering if there are further ways I could try to debug this problem. The error might occur somewhere outside of the actual python code.
When I try running it in Juypter, the same error message as shown in this question appears.
Here is the example:
import numpy
import scipy.integrate as integrate
N = 300
def f(t, x):
return numpy.ravel(numpy.ones((2, N, N, N), dtype=complex))
ival = numpy.ravel(numpy.ones((2, N, N, N), dtype=complex))
integrate.solve_ivp(f, (0, 100), ival)

Related

Improve Runtime for calling R function in python function using rpy2

I primarily program in python (using jupyter notebooks) but on occasion need to use an R function. I currently do this by using rpy2 and R magic, which works fine. Now I would like to write a function which will summarize part of my analysis procedure into one wrapper function (so I don't always need to run all of the code cells but can simply execute the function once). As part of this procedure I need to call an R function. I adapted my code to import the R function to python using the rpy2.robjects interface with importr. This works but is extremely slow (more than triple the run time for an already lengthy procedure) which makes this simply not feasible analysiswise. I am assuming this has to do with me accessing R through the high-level interface of rpy2 instead of the low-level interface. I am unsure of how to use the low-level interface within a function call though and would need some help adapting my code.
I've tried looking into the rpy2 documentation but am struggling to understand it.
This is my code for executing the R function call from within python using R magic.
Activating rpy2 R magic
%load_ext rpy2.ipython
Load my required libaries
%%R
library(scran)
Actually call the R function
%%R -i data_mat -i input_groups -o size_factors
size_factors = computeSumFactors(data_mat, clusters=input_groups, min.mean=0.1)
This is my alternative code to import the R function using rpy2 importr.
from rpy2.robjects.packages import importr
scran = importr('scran')
computeSumFactors = scran.computeSumFactors
size_factors = computeSumFactors(data_mat, clusters=input_groups, min_mean=0.1)
For some reason this second approach is orders of magnitude slower.
Any help would be much apreciated.
The only difference between the two that I would see have an influence on the observe execution speed is conversion.
When running in an "R magic" code cell (prefixed with %%R), in your example the result of calling computeSumFactors() is an R object bound to the symbol size_factors in R. In the other case, the result of calling the function computeSumFactors() will go through the conversion system (and there what exactly happens depends on what are the active converters you have) before the result is bound to the Python symbol size_factors.
Conversion can be costly: you should consider trying to deactivate numpy / pandas conversion (the localconverter context manager can be a convenient way to temporarily use minimal conversion for a code block).

VSCode Itellisense with python C extension module (petsc4py)

I'm currently using a python module called petsc4py (https://pypi.org/project/petsc4py/). My main issue is that none of the typical intellisense features seems to work with this module.
I'm guessing it might have something to do with it being a C extension module, but I am not sure exactly why this happens. I initially thought that intellisense was unable to look inside ".so" files, but it seems that numpy is able to do this with the array object, which in my case is inside a file called multiarray.cpython-37m-x86_64-linux-gnu (check example below).
Does anyone know why I see this behaviour in the petsc4py module. Is there anything that I (or the developers of petsc4py) can do to get intellisense to work?
Example:
import sys
import petsc4py
petsc4py.init(sys.argv)
from petsc4py import PETSc
x_p = PETSc.Vec().create()
x_p.setSizes(10)
x_p.setFromOptions()
u_p = x_p.duplicate()
import numpy as np
x_n = np.array([1,2,3])
u_n = x_n.copy()
In this example, when trying to work with a Vec object from petsc4py, doing u_p.duplicate() cannot find the function and the suggestion is simply a repetition of the function immediately before. However, using an array from numpy, doing u_n.copy() works perfectly.
If you're compiling in-place then you're bumping up against https://github.com/microsoft/python-language-server/issues/197.

How to get interactive R output in Jupyter (IPython, rpy2), e.g. for a progress bar?

I am trying to use the built-in R progress-bar (txtProgressBar) with %%R magic in Jupyter. While it does produce a nice animation when executed in the R console or RStudio, it does not produce the desired output in the Jupyter (notebook or lab) with an rpy2 extension instead, printing all the steps at once after finishing (which makes the progress bar useless). Two questions:
How could I make it work?
If it is not possible yet, how do I approach implementing this functionality on the rpy2 side (I already know how to make the interactive output/widgets on the Jupyter/IPython side)?
Here is a simple snippet of a progress bar from rfunction.com:
%%R
SEQ <- seq(1,100)
pb <- txtProgressBar(1, 100, style=3)
TIME <- Sys.time()
for(i in SEQ){
Sys.sleep(0.02)
setTxtProgressBar(pb, i)
}
For the folks new to rpy2: It needs to be installed with pip install rpy2 and the magic needs to be loaded in Jupyter with %load_ext rpy2.ipython.
Edit: The workaround I use for now is to manually invoke the code via robjects.r:
from rpy2.robjects import r
r("""
SEQ <- seq(1,100)
pb <- txtProgressBar(1, 100, style=3)
TIME <- Sys.time()
for(i in SEQ){
Sys.sleep(0.02)
setTxtProgressBar(pb, i)
}
""")
however this is not ideal - I would prefer to keep all the benefits of the rpy2's Rmagic.
There should be a way to achieve this, as the R magic is calling robjects.r() (as you are in your workaround).
In short, the following is happening when you submit an %%R jupyter cell for evaluation.
Parameters on the %%R line are evaluated and eventual setup prior to the evaluation of the R code is done (e.g., use a local converter, convert input parameters, etc...)
The R code in the rest of the %%R cell is evaluated in the R "Global Environment" as a string of code
Exit setup is run and results are returned
The second step is a essentially a call to the R C API, which the GIL makes the only activity happening with that process. However, rpy2 is defining default callbacks that reroute R's printing to the terminal/console to Python's own print() which is why you see the prints as the code is running in your call to robjects.r().
I am seeing that the R magic is caching the R output, and while there is an attribute cache_display_data that should control this is it not used. This is bug, for the reason your are asking on Stackoverflow, and because an R code block printing a lot would use more memory than needed (and even exhaust all RAM). I do not know whether it has always be present or it was introduced during code refactoring; it is now tracked here: https://bitbucket.org/rpy2/rpy2/issues/543
Edit: The fix is now in the repository, and will be part of rpy2-3.0.3 (likely released today).

object was probably modified after being freed with Python on Mac

I have the following code in a function:
phi = fourier_matrix(y, fs)
N = np.size(phi,axis=1)
x = np.ones(N)
for i in range(136):
W_pk = np.diag(x)
temp = pinv(np.dot(phi, W_pk))
q_k = np.dot(temp, y)
x = np.dot(W_pk, q_k)
where phi is (96,750), W_pk is (750,750) and q_k is (750,).
This throws the following error:
Python(12001,0x7fff99d8b380) malloc: * error for object 0x7fc71aa37200: incorrect checksum for freed object - object was probably modified after being freed.
* set a breakpoint in malloc_error_break to debug
If I comment the last dot product the error does not appear.
I think I need to free memory in some way or maybe do the dot product in a different way?
Also, this only happens when I run it from a mac. On windows or linux it does not throw the error.
Python is 3.6 (tried with 3.7), and numpy is 1.14.5, also tried with 1.15
Any help would be greatly appreciated, since I really need to make this work!
Thanks in advance.
EDIT I:
I tried this portion of the code on a jupyter notebook, and it didn't fail. This confused me even more! It fails when I run it in Visual Studio Code on a mac. The rest of my code, an algorithm to remove artifacts from a signal, works as it should until I add that last piece of code x = np.dot(W_pk, q_k). Maybe it works on jupyter because I don't run the rest of the algorithm there? but as I said, it only crashes on that last dot product.
EDIT II: I added the piece of code above the for loop to this question, because I found that the problem is somehow related to how x is being used. You see it's declared above as a float64 ndarray. When it reaches the last line of the for loop, the dot product returns a complex128 (should be complex64, don't know what's happening there) and overwrites the x array. The first time works, but the second time it crashes when trying to overwrite. If I use a new variable for the dot product, say z, then it does not crash! not sure why... but I need to overwrite x in each iteration.
Furthermore, if I do something like this:
z = np.dot(W_pk, q_k)
x = abs(z) #I don't need complex numbers at this point
Then it crashes with the same error on the first dot product (presumably):
temp = pinv(np.dot(phi, W_pk))
Also, the memory consumption is not that bad, around 110M according some measurements, and the same algorithm does not crash on iPython with twice the memory usage. This is what I find the most obscure, why doesn't it crash on iPython??

How do you import a Python library within an R package using rPython?

The basic question is this: Let's say I was writing R functions which called python via rPython, and I want to integrate this into a package. That's simple---it's irrelevant that the R function wraps around Python, and you proceed as usual. e.g.
# trivial example
# library(rPython)
add <- function(x, y) {
python.assign("x", x)
python.assign("y", y)
python.exec("result = x+y")
result <- python.get("result")
return(result)
}
But what if the python code with R functions require users to import Python libraries first? e.g.
# python code, not R
import numpy as np
print(np.sin(np.deg2rad(90)))
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
If you run this without importing the library numpy, you will get an error.
How do you import a Python library in an R package? How do you comment it with roxygen2?
It appears the R standard is this:
# R function that call Python via rPython
# *this function will not run without first executing `import numpy as np`
print_sin <- function(degree){
python.assign("degree", degree)
python.exec('import numpy as np')
python.exec('result = np.sin(np.deg2rad(degree))')
result <- python.get('result')
return(result)
}
Each time you run an R function, you will import an entire Python library.
As #Spacedman and #DirkEddelbuettel suggest you could add a .onLoad/.onAttach function to your package that calls python.exec to import the modules that will typically always be required by users of your package.
You could also test whether the module has already been imported before importing it, but (a) that gets you into a bit of a regression problem because you need to import sys in order to perform the test, (b) the answers to that question suggest that at least in terms of performance, it shouldn't matter, e.g.
If you want to optimize by not importing things twice, save yourself the hassle because Python already takes care of this.
(although admittedly there is some quibblingdiscussion elsewhere on that page about possible scenarios where there could be a performance cost).
But maybe your concern is stylistic rather than performance-oriented ...

Categories