Matlab's cannot call Python code that imports statsmodels - python

This question concerns Matlab 2014b, Python 3.4 and Mac OS 10.10.
I have the following Python file tmp.py:
from statsmodels.tsa.arima_process import ArmaProcess
import numpy as np
def generate_AR_time_series():
arparams = np.array([-0.8])
maparams = np.array([])
ar = np.r_[1, -arparams]
ma = np.r_[1, maparams]
arma_process = ArmaProcess(ar, ma)
return arma_process.generate_sample(100)
I want to call generate_AR_time_series from Matlab so I used:
py.tmp.generate_AR_time_series()
which gave a vague error message
Undefined variable "py" or class "py.tmp.generate_AR_time_series".
To look into the problem further, I tried
tmp = py.eval('__import__(''tmp'')', struct);
which gave me a detailed but still obscured error message:
Python Error:
dlopen(/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/scipy/special/_ufuncs.so, 2): Symbol
not found: __gfortran_stop_numeric_f08
Referenced from: /opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/scipy/special/_ufuncs.so
Expected in: /Applications/MATLAB_R2014b.app/sys/os/maci64/libgfortran.3.dylib
in /opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/scipy/special/_ufuncs.so
I can call the function within Python just fine, so I guess the problem is with Matlab. From the detailed message, it seems that the problem lies in something is expected in the Matlab installation path, but of course Matlab installation path does not contain those things since they are third-party libraries for Python.
How to solve this problem?
Edit 1:
libgfortran.3.dylib can be found in a lot of places:
/Applications/MATLAB_R2014a.app/sys/os/maci64/libgfortran.3.dylib
/Applications/MATLAB_R2014b.app/sys/os/maci64/libgfortran.3.dylib
/opt/local/lib/gcc48/libgfortran.3.dylib
/opt/local/lib/gcc49/libgfortran.3.dylib
/opt/local/lib/libgcc/libgfortran.3.dylib
/Users/wdg/Documents/MATLAB/mcode/nativelibs/macosx/bin/libgfortran.3.dylib

Try:
setenv('DYLD_LIBRARY_PATH', '/usr/local/bin/');

For me, using the setenv approach from within MATLAB did not work. Also, MATLAB modifies the DYLD_LIBRARY_PATH variable during startup to include necessary libraries.
First, you have to make sure which version of gfortran scipy was linked against: in Terminal.app, enter otool -L /opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/scipy/special/_ufuncs.so and look for 'libgfortran' in the output.
It worked for me to copy $(MATLABROOT)/bin/.matlab7rc.sh to my home directory and change the line LDPATH_PREFIX='' in the mac section (around line 195 in my case) to LDPATH_PREFIX='/opt/local/lib/gcc49', or whatever path to libgfortran you found above.
This ensures that /opt/local/lib/gcc49/libgfortran.3.dylib is found before the MATLAB version, but leaves other paths intact.

Related

Python subprocess call won't install R package

I have a Python subprocess to call R:
cmd = ['Rscript', 'Rcode.R', 'file_to_process.txt']
out = subprocess.run(cmd, universal_newlines = True, stdout = subprocess.PIPE)
lines = out.stdout.splitlines() #split stdout
My R code first checks if the 'ape' package is installed before proceeding:
if (!require("ape")) install.packages("ape")
library(ape)
do_R_stuff.......
return_output_to_Python
Previously, the whole process from Python to R worked perfectly - R was called and processed output was returned to Python - until I added the first line (if (!require("ape")) install.packages("ape")). Now Python reports: "there is no package called 'ape'" (i.e. when I uninstall ape in R). I have tried wait instructions in both the R and Python scripts but I can't get it working. When checking, the R code works in isolation.
The full error output from Python is:
Traceback (most recent call last):
File ~\Documents\GitHub\wolPredictor\wolPredictor_MANUAL_parallel.py:347 in <module>
if __name__ == '__main__': main()
File ~\Documents\GitHub\wolPredictor\wolPredictor_MANUAL_parallel.py:127 in main
cophen, purge, pge_incr, z, _ = R_cophen('{}/{}'.format(dat_dir, tree), path2script) #get Dist Mat from phylogeny in R
File ~\Documents\GitHub\wolPredictor\wolPredictor_MANUAL_parallel.py:214 in R_cophen
purge = int(np.max(cophen) * 100) + 1 #max tree pw distance
File <__array_function__ internals>:5 in amax
File ~\anaconda3\lib\site-packages\numpy\core\fromnumeric.py:2754 in amax
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File ~\anaconda3\lib\site-packages\numpy\core\fromnumeric.py:86 in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
Loading required package: ape
Installing package into 'C:/Users/Windows/Documents/R/win-library/4.1'
(as 'lib' is unspecified)
Error in contrib.url(repos, "source") :
trying to use CRAN without setting a mirror
Calls: install.packages -> contrib.url
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called 'ape'
Execution halted
Point 1: The path to R libraries when using R in a standalone mode may not be the same when using Rscript.
Point 2: The error says there was difficulty in finding the CRAN repository, so perhaps the options that set the repos were not set for the Rscript environment. They can be set in the call to install packages or with a Sys.setenv() call.
Ther OP wrote: "Thanks #IRTFM I had to set a new sub process in a new .R file specifically for the install, but the CRAN mirror was key, I never realised it would be an issue as it's not a requirement on my local machine (not sure why it becomes an issue thru subprocess)."
The places to find more information are the ?Startup help page and the ?Rscript page. Rscript has many fewer defaults. Even the usual set of recommended packages may not get loaded by default. The Rscript help page includes these flags which could be used for debugging and setting a proper path to the libraries desired.:
--verbose
gives details of what Rscript is doing.
--default-packages=list
where list is a comma-separated list of package names or NULL. Sets the environment variable R_DEFAULT_PACKAGES which determines the packages loaded on startup.
Here is a previous similar SO question with an answer that includes some options for construction of a proper working environment: Rscript: There is no package called ...?
There are a few R packages on CRAN that aid in overcoming some of the differences between programming for standalone R and Rsript.
getopt: https://cran.r-project.org/web/packages/getopt/index.html
optparse: https://cran.r-project.org/web/packages/optparse/index.html (styled after a similar Python package.)
argparse: https://cran.r-project.org/web/packages/argparse/index.html
I solved the issue (thanks to #IRTFM) by placing the if-then-install.packages code in a separate Rscript (including the CRAN mirror):
if (!require("ape")) install.packages("ape", repos='http://cran.us.r-project.org')
which I then called using a separate Python subprocess in my Python routine

sklearn internals access cython classes and functions

I am interested in testing out many of the internal classes and functions defined within sklearn (eg. maybe add print statement to the treebuilder so I can see how the tree got built). However as many of the internals were written in Cython, I want to learn what is the best practices and workflows of testing out the functions in Jupyter notebook.
For example, I managed to import the Stack class from the tree._utils module. I was even able to construct it but unable to call any of the methods. Any thoughts on what I should do in order to call and test the cdef classes and its methods in Python?
%%cython
from sklearn.tree import _utils
s = _utils.Stack(10)
print(s.top())
# AttributeError: 'sklearn.tree._utils.Stack' object has no attribute 'top'
There are some problems which must be solved in order to be able to use c-interfaces of the internal classes.
First problem (skip if your sklearn version is >=0.21.x):
Until version 0.21.x sklearn used implicit relative imports (as in Python2), compiling it with Cython's language_level=3 (default in IPython3) would not work - so setting language_level=2 is needed for versions < 0.21.x (i.e. %cython -2) or even better, scikit-learn should be updated.
Second problem:
We need to include path to numpy-headers. Let's take a look at a simpler version:
%%cython
from sklearn.tree._tree cimport Node
print("loaded")
which fails with nothing saying error "command 'gcc' failed with exit status 1" - but the real reason can be seen in the terminal, where gcc outputs its error message (and not to notebook):
fatal error: numpy/arrayobject.h: No such file or directory
compilation terminated.
_tree.pxd uses numpy-API and thus we need to provide the location of numpy-headers.
That means we need to add include_dirs=[numpy.get_include()] to Extension definition. There are two ways to do it in %%cython-magic, via -I option:
%%cython -I <path from numpy.get_include()>
...
or somewhat dirtier trick, exploiting that %%cython magic will add the include automatically when it sees string "numpy", by adding a comment like
%%cython
# requires numpy headers
...
is enough.
Last but not least:
Note: since 0.22 this is no longer an issue as pxd-files are included into the installation (see this).
The pxd-files must be present in the installation for us to be able to cimport them. This is the case for pxd-files from the sklearn.tree subpackage, as one can see in the local setup.py-file (given this PR, this seems to be more or less a random decision without a strategy behind):
...
config.add_data_files("_criterion.pxd")
config.add_data_files("_splitter.pxd")
config.add_data_files("_tree.pxd")
config.add_data_files("_utils.pxd")
...
but not for some other cython-extensions, in particular not for sklearn.neighbors-subpackage. Now, that is a problem for your example:
%%cython
# requires numpy headers
from sklearn.tree._utils cimport Stack
s = Stack(10)
print(s.top())
fails to be cythonized, because _utils.pxd cimports data structures from neighbors/*.pxd's:
...
from sklearn.neighbors.quad_tree cimport Cell
...
which are not present in the installation.
The situation is described with more details in this SO-post, your options to build are (as described in the link)
copy pdx-files to installation
reinstall from the downloaded source with pip install -e
reinstall from the downloaded source after manipulating corresponding local setup.py-files.
Another option is to ask the developers of sklearn to include pxd-files into the installation, so not only building but also distribution becomes possible.

VSCode Itellisense with python C extension module (petsc4py)

I'm currently using a python module called petsc4py (https://pypi.org/project/petsc4py/). My main issue is that none of the typical intellisense features seems to work with this module.
I'm guessing it might have something to do with it being a C extension module, but I am not sure exactly why this happens. I initially thought that intellisense was unable to look inside ".so" files, but it seems that numpy is able to do this with the array object, which in my case is inside a file called multiarray.cpython-37m-x86_64-linux-gnu (check example below).
Does anyone know why I see this behaviour in the petsc4py module. Is there anything that I (or the developers of petsc4py) can do to get intellisense to work?
Example:
import sys
import petsc4py
petsc4py.init(sys.argv)
from petsc4py import PETSc
x_p = PETSc.Vec().create()
x_p.setSizes(10)
x_p.setFromOptions()
u_p = x_p.duplicate()
import numpy as np
x_n = np.array([1,2,3])
u_n = x_n.copy()
In this example, when trying to work with a Vec object from petsc4py, doing u_p.duplicate() cannot find the function and the suggestion is simply a repetition of the function immediately before. However, using an array from numpy, doing u_n.copy() works perfectly.
If you're compiling in-place then you're bumping up against https://github.com/microsoft/python-language-server/issues/197.

Gurobi Python Error (NoneType has no len())

I need to write an optimization file for Gurobi (Python) that is a modified version of a classic TSP. I tried to run the example file from their website:
examples.gurobi.com/traveling-salesman-problem/
I always get the following error:
TypeError: object of type 'NoneType' has no len()
What do I need to change?
Thx
Full code: https://www.dropbox.com/s/ewisx805b3o2wq5/beispiel_opt.py?dl=0
I can confirm the error with the example code from Gurobi's website. At the first look the problem seems to be inside the subtour function, that returns None if sum(lengths) == n and the missing check for if tour is None inside the subtourlim function.
Instead of providing a fix for the specific code, I first checked the examples that Gurobi installs inside the specific installation directory:
Mac: /Library/gurobi810/mac64/examples/python/
Linux: /opt/gurobi800/linux64/examples/python/
Windows: c:\gurobi800\win64\examples\python\
And surprisingly the tsp.py from there runs without any errors. Note also that the two mentioned functions are revised. So I guess the example from the website is just a old version of the code.

TensorFlow 1.5.0-rc0: error using `tf.app.flags`

The following flags were defined in a misc_fun.py file to include machine and directories info:
import tensorflow as tf
flags = tf.app.flags
FLAGS = flags.FLAGS
# definitions
flags.DEFINE_string(
'DEFAULT_IN',
'~/PycharmProjects/myNN/Data/',
"""Default input folder.""")
...
It worked fine in TensorFlow 1.0 - 1.4 versions (with Pycharm). After updating to TensorFlow 1.5.-rc0, the following error occurred:
Usage:
from misc_fun import FLAGS
FLAGS.DEFAULT_IN = FLAGS.DEFAULT_DOWNLOAD # change default input folder
Error:
UnparsedFlagAccessError: Trying to access flag --DEFAULT_DOWNLOAD before flags were parsed.
However print(FLAGS) worked fine, which gives:
misc_fun:
--DEFAULT_DOWNLOAD: default download folder for large datasets.
(default: '/home/username/Downloads/Data/')
--DEFAULT_IN: default input folder.
(default: '~/PycharmProjects/myNN/Data/')
...
I tried FLAGS = flags.FLAGS(sys.argv), resulting in the following error:
UnrecognizedFlagError: Unknown command line flag 'f'
Although there is a workaround using the class object, I wonder what could be the problem here.
I have tried adding the following line below.
tf.app.flags.DEFINE_string('f', '', 'kernel')
This solution is different from others in that it is simple and easy to try. You just need to add this into your code, and it doesn't change your system. Please let me know if this solution helps solve other people's problems.
The reference for this solution is from a Chinese website: https://blog.csdn.net/qq_39956625/article/details/80500291
With 1.5.0-rc0 the Tensorflow maintainers have switched tf.app.flags to the flags module from abseil. Unfortunately, it is not 100% API compatible to the previous implementation. I worked around your problem with something like
remaining_args = FLAGS([sys.argv[0]] + [flag for flag in sys.argv if flag.startswith("--")])
assert(remaining_args == [sys.argv[0]])
before accessing the FLAGS object the first time.
Alternatively you can use FLAGS(sys.argv, known_only=True) to parse all related flags (the ones defined using tf.app.flags.DEFINE_xxx). This will release any other args that are not known. Useful if you have some command line arguments that are not related to TF.

Categories