Rpy2: refactor an R magic cell into Python function - python

Now I use Rpy2 in Jupyter notebook to fit von Mises distribution, the code is
%load_ext rpy2.ipython
%R require(movMF)
%%R -i dir_data,n_vM_dir -o theta,alpha
result = movMF(dir_data, n_vM_dir, nruns = 10)
theta = result$theta
alpha = result$alpha
Input: dir_data,n_vM_dir
Output: theta,alpha
It will take the dir_data, n_vM_dir variables in Python and pass them into R. After the fitting, theta and alpha will pass back to Python, so I can use them in later analysis.
Now, I want to refactor the code into a Python function, so I can reuse it, how can I do it?
I can do this so far
import rpy2.robjects as robjects
# pass dir_data, n_vM_dir into R
robjects.r('''
result = movMF(dir_data, n_vM_dir, nruns = 10)
theta = result$theta
alpha = result$alpha
''')
theta = robjects.r('theta')
alpha = robjects.r('alpha')
# Return theta, alpha
I can access the data through robjects.r, the main problem is that
I don't know how to pass the data stored in Python to R (dir_data,n_vM_dir in this example).
I've read the docs in http://rpy2.readthedocs.io/en/version_2.8.x/introduction.html
I find the variables are created by
from rpy2.robjects import FloatVector
ctl = FloatVector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14])
But this looks very complex compared to the R magic in jupyter notebook.

Related

kernlab sigest function arguments

I would like to use the function sigest() of kernlab in Python to estimate a good range for sigmas that I'll use in the construction of RBF Kernels. I am using rpy2 but I can't figure out what would be the argument for "na_action".
Recommended syntax in R:
sigest(x, frac = 0.5, scaled = TRUE, na.action = na.omit)
My syntax:
sigest(np.asmatrix(x), frac = 0.5, scaled = True,
na_action = pandas2ri.pandas.DataFrame.dropna)
x is the data matrix. I also tried
sigest(np.asmatrix(x), frac = 0.5, scaled = True,
na_action = pd.DataFrame.dropna)
Libraries used: matplotlib ,numpy, pandas. Also numpy2ri and pandas2ri
import matplotlib
import numpy as np
import pandas as pd
import rpy2
import rpy2.robjects as robj
from rpy2.robjects.packages import importr
from rpy2.robjects import numpy2ri
rpy2.robjects.numpy2ri.activate()
lab = importr("kernlab")
# ommiting the part of x initialization. it reads the data of a csv file and it's an array (40,1))
y = lab.sigest(np.asmatrix(x), frac = 0.5, scaled = True, na_action = 'ignore')
None of those Pandas methods will work for the na.action argument which expects an R call to stats::na.omit. Therefore, you must somehow reference this R method. Additionally, because the parameter maintains a dot in its name which is not allowed in identifiers of Python variables, consider adjusting parameter name manually with rpy2's SignatureTranslatedFunction if it is not handled automatically with importr:
from rpy2.robjects.functions import SignatureTranslatedFunction
from rpy2.robjects.packages import importr
lab = importr('kernlab')
lab.sigest = SignatureTranslatedFunction(lab.sigest,
init_prm_translate = {'na_action': 'na.action'})
Then, try passing the needed action call (to renamed parameter) as a string to avoid it being called directly by Python like you can for other methods, t.test, cor.test, lm, using same na.action argument:
y = lab.sigest(np.asmatrix(x), frac=0.5, scaled=True, na_action="na.omit")

Python version's tmvtnorm::rtmvnorm which original at R

For simulating some data, I need to Sampling Random Numbers From The Truncated Multivariate Normal Distribution. Which is description of a function called tmvtnorm::rtmvnorm in R.
I have tried the function in R. But my script is major written by python. So I would like to know If there are any function could do the same things?
I have tried truncnorm in scipy, emcee(python libray). But it all doesn't work like the result outputed by tmvtnorm::rtmvnorm.
Finally, I am using the rpy2 to get the result output from R.
Here is the needed question:
Any tools which could work like tmvtnorm::rtmvnorm?
Any explan about the differences of tmvtnorm::rtmvnorm and truncnorm in scipy.
Thanks.
We could call R from python and get the output generated from rtmvnorm
from pyper import *
import pandas as pd
r=R(use_pandas=True)
r('''
library(tmvtnorm)
sigma <- matrix(c(4,2,2,3), ncol=2)
x <- rtmvnorm(n=500, mean=c(1,2), sigma=sigma, upper=c(1,0))
''')
out = pd.DataFrame(r.get('x'))
pr int(out.head(5))
# 0 1
# 0 -0.832567 -1.976393
# 1 0.466617 -0.266892
# 2 0.802809 -0.403514
# 3 -2.295357 -1.896990
# 4 -0.128641 -0.392827

rpy2 Dynamic Time Warping (dtw) in python - windowing does not work

A now closed discussion shows how to use the R dtw package in python. This is a little clumsy, but the R dtw package is great and better than currently available python dtw implementations. Unfortunately, the windowing functions like the Sakoe-Chiba band do not work when trying to specify a "window.size". There appears to be an issue with the mapping to the argument. Note that "." in arguments is supposed to be replaced with "_" when using rpy2. But following this convention, the argument is not being used for some reason.
import numpy as np
import rpy2.robjects.numpy2ri
from rpy2.robjects.packages import importr
rpy2.robjects.numpy2ri.activate()
# Set up our R namespaces
R = rpy2.robjects.r
DTW = importr('dtw')
# Generate our data
idx = np.linspace(0, 2*np.pi, 100)
template = np.cos(idx)
query = np.sin(idx) + np.array(R.runif(100))/10
# Calculate the alignment vector and corresponding distance
alignment = R.dtw(query, template, keep=True,window_type='sakoechiba',
window_size=5)
>>> RRuntimeError: Error in window.function(row(wm), col(wm), query.size= n, reference.size = m, :
argument "window.size" is missing, with no default
You can see that the error states "window.size" is missing, despite "window_size" clearly being specified in the rpy2 fashion.
Just a note from the future: this question is now superseded by the feature-equivalent dtw-python package (also found on PyPI). The rpy2-R-dtw bridge should no longer be necessary.
Answering my own question in case anyone ever has the same issue. The problem is the argument mapping and the R three dots ellipsis ‘...’. This can be fixed by specifying the mapping manually.
from rpy2.robjects.functions import SignatureTranslatedFunction
R.dtw = SignatureTranslatedFunction(R.dtw,
init_prm_translate={'window_size': 'window.size'})
So with this specification the window_size argument is used correctly.
import numpy as np
import rpy2.robjects.numpy2ri
from rpy2.robjects.packages import importr
from rpy2.robjects.functions import SignatureTranslatedFunction
rpy2.robjects.numpy2ri.activate()
# Set up our R namespaces
R = rpy2.robjects.r
DTW = importr('dtw')
R.dtw = SignatureTranslatedFunction(R.dtw,
init_prm_translate={'window_size': 'window.size'})
# Generate our data
idx = np.linspace(0, 2*np.pi, 100)
template = np.cos(idx)
query = np.sin(idx) + np.array(R.runif(100))/10
# Calculate the alignment vector and corresponding distance
alignment = R.dtw(query, template, keep=True,window_type='sakoechiba',
window_size=10)
dist = alignment.rx('distance')[0][0]
print(dist)
>>> 117.348292359

How to specify the number of peaks in Python

So far I found 4 ways to find peaks in Python, however none of them can specify the number of peaks like Matlab does. Can someone provide some insight?
import scipy.signal as sg
import numpy as np
# Method 1
sg.find_peaks_cwt(vector, np.arange(1,4),max_distances=np.arange(1, 4)*2)
# Method 2
sg.argrelextrema(np.array(vector),comparator=np.greater,order=2)
# Method 3
sg.find_peaks(vector, height=7, distance=2.1)
# Method 4
detect_peaks.detect_peaks(vector, mph=7, mpd=2)`
Below is the Matlab code that I want to emulate:
[pks,locs] = findpeaks(data,'Npeaks',n)
If you want the exact function Matlab has, why not just use that function? If you have the rest of your data in Python, then you can just use the module provided by Matlab.
import matlab.engine #import matlab engine
eng = matlab.engine.start_matlab() #Start matlab engine
a = a = [(0.1*i)*(0.1*i-1)*(0.1*i-2) for i in range(50)] #Create some data with peaks
b = eng.findpeaks(matlab.double(a),'Npeaks',1) #Find 1 peak
Try the findpeaks library. Multiple methods are available for the detections of peaks and valleys in 1D-vectors and 2D-arrays (images).
pip install findpeaks
Lets create some peaks:
i = 10000
xs = np.linspace(0,3.7*np.pi,i)
X = (0.3*np.sin(xs) + np.sin(1.3 * xs) + 0.9 * np.sin(4.2 * xs) + 0.06 *
np.random.randn(i))
# import library
from findpeaks import findpeaks
# Initialize
fp = findpeaks()
# Find the peaks (high/low)
results = fp.fit(X)
# Make plot
fp.plot()
# Some of the results:
results['df']

SciPy: generating custom random variable from PMF

I'm trying to generate random variables according to a certain ugly distribution, in Python. I have an explicit expression for the PMF, but it involves some products which makes it unpleasant to obtain and invert the CDF (see below code for explicit form of PMF).
In essence, I'm trying to define a random variable in Python by its PMF and then have built-in code do the hard work of sampling from the distribution. I know how to do this if the support of the RV is finite, but here the support is countably infinite.
The code I am currently trying to run as per #askewchan's advice below is:
import scipy as sp
import numpy as np
class x_gen(sp.stats.rv_discrete):
def _pmf(self,k,param):
num = np.arange(1+param, k+param, 1)
denom = np.arange(3+2*param, k+3+2*param, 1)
p = (2+param)*(np.prod(num)/np.prod(denom))
return p
pa_limit = limitrv_gen()
print pa_limit.rvs(alpha,n=1)
However, this returns the error while running:
File "limiting_sim.py", line 42, in _pmf
num = np.arange(1+param, k+param, 1)
TypeError: only length-1 arrays can be converted to Python scalars
Basically, it seems that the np.arange() list isn't working somehow inside the def _pmf() function. I'm at a loss to see why. Can anyone enlighten me here and/or point out a fix?
EDIT 1: cleared up some questions by askewchan, edits reflected above.
EDIT 2: askewchan suggested an interesting approximation using the factorial function, but I'm looking more for an exact solution such as the one that I'm trying to get work with np.arange.
You should be able to subclass rv_discrete like so:
class mydist_gen(rv_discrete):
def _pmf(self, n, param):
return yourpmf(n, param)
Then you can create a distribution instance with:
mydist = mydist_gen()
And generate samples with:
mydist.rvs(param, size=1000)
Or you can then create a frozen distribution object with:
mydistp = mydist(param)
And finally generate samples with:
mydistp.rvs(1000)
With your example, this should work, since factorial automatically broadcasts. But, it might fail for large enough alpha:
import scipy as sp
import numpy as np
from scipy.misc import factorial
class limitrv_gen(sp.stats.rv_discrete):
def _pmf(self, k, alpha):
#num = np.prod(np.arange(1+alpha, k+alpha))
num = factorial(k+alpha-1) / factorial(alpha)
#denom = np.prod(np.arange(3+2*alpha, k+3+2*alpha))
denom = factorial(k + 2 + 2*alpha) / factorial(2 + 2*alpha)
return (2+alpha) * num / denom
pa_limit = limitrv_gen()
alpha = 100
pa_limit.rvs(alpha, size=10)

Categories