Can dill remember libraries used by a class? - python

If I create a class that imports a library and use dill to pickle it, when I unpickle it I cannot find the library:
import dill
from sklearn.metrics.cluster import adjusted_rand_score
import pandas as pd
import random
class Test1():
def __init__(self, df):
self.genomes = df
#staticmethod
def percentageSimilarityDistance(genome1, genome2):
if len(genome1) != len(genome2):
raise ValueError('Genome1 and genome2 must have the same length!')
is_gene_correct = [1 if genome1[idx] == genome2[idx] else 0 for idx in range(len(genome1))]
return (1 - sum(is_gene_correct)/(len(is_gene_correct) * 1.0))
def createDistanceMatrix(self, distance_function):
"""Takes a dictionary of KO sets and returns a distance (or similarity) matrix which is basically how many genes do they have in common."""
genomes_df = self.genomes.copy()
no_of_genes, no_of_genomes = genomes_df.shape
list_of_genome_names = list(genomes_df.columns)
list_of_genomes = [list(genomes_df.loc[:, name]) for name in list_of_genome_names]
distance_matrix = [[distance_function(list_of_genomes[i], list_of_genomes[j]) for j in range(no_of_genomes)] for i in range(no_of_genomes)]
distance_matrix = pd.DataFrame(distance_matrix, columns = list_of_genomes, index = list_of_genomes)
return distance_matrix
# create fake data
df = pd.DataFrame({'genome' + str(idx + 1): [random.randint(0, 1) for lidx in range(525)] for idx in range(10)})
test1 = Test1(df)
test2 = Test2(df)
# save pickles
with open('test1.pkl', 'wb') as pkl:
dill.dump(test1, pkl)
I successfully unpickle the file but when I try to use one of the methods it can't find Pandas.
$ ipython
Python 3.5.4 |Anaconda custom (64-bit)| (default, Nov 20 2017, 18:44:38)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import dill
In [2]: with open('test1.pkl', 'rb') as pkl:
...: test1 = dill.load(pkl)
...:
In [3]: test1.createDistanceMatrix(test1.percentageSimilarityDistance)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-5918638722b1> in <module>()
----> 1 test1.createDistanceMatrix(test1.percentageSimilarityDistance)
/space/oc13378/myprojects/python/dill_tests/dill_tests.py in createDistanceMatrix(self, distance_function)
29 return distance_matrix
30
---> 31 class Test2():
32 import dill
33 from sklearn.metrics.cluster import adjusted_rand_score
NameError: name 'pd' is not defined
Is it possible to get this to work by only importing the dill library?

I'm the dill author. The easy thing to do is to put the import inside the function. Further, if you put the import both inside and outside your function, then you won't have a speed hit on the first call of your function.

Related

How to save pyprover object with pickle

I want to save logical expressions of pyprover with pickle.
The following is the code I wrote in google corabolatory.
!pip install pyprover
import pickle
from pyprover import *
logic = ~A & B
print(1,logic)
print(2,type(logic))
print(3,type(A))
with open("test.pickle","wb") as f:
pickle.dump(logic,f)
with open("test.pickle","rb") as f:
logic2 = pickle.load(f) # error
print(logic2)
The output is below
1 ~A & B
2 <class 'pyprover.logic.And'>
3 <class 'pyprover.logic.Prop'>
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-27-4cd4fe38fcb4> in <module>
12
13 with open("test.pickle","rb") as f:
---> 14 logic2 = pickle.load(f) # error
15
16 print(logic2)
AttributeError: 'Top' object has no attribute 'elems'
How can I save the logical expression with pickle?
pyprover github
If you know how to save the object, I do not care about whether or not it uses pickle. I tried dill,only to get the same result as pickle.
I wrote an dill issue here
I tried to import modules like this but it didn't solve the problem.
import dill
import pyprover
from pyprover import *
from pyprover.logic import *
from pyprover.parser import *
from pyprover.constants import *
from pyprover.atoms import *
from pyprover.util import *
from pyprover.tools import *
from pyprover.__coconut__ import *
from pyprover.__init__ import *

When using RDKIT, object is not iterable error appears

I am trying to use tanimoto similarity to compare molecular fingerprints using rdkit. I am trying to compare the two items in list 1 with the one item in list 2. However, I get getting an error. I do not understand it because I have anything named "Mol" in my code. Does anyone have any advice? Thank you
from rdkit import Chem
from rdkit.Chem import rdFingerprintGenerator
from rdkit.Chem import DataStructs
mol1 = ('CCO', 'CCOO')
mol2 = ('CC')
fii = Chem.MolFromSmiles(mol2)
fpgen1 = rdFingerprintGenerator.GetMorganGenerator(radius=2)
fps1 = [fpgen1.GetFingerprint(m) for m in fii]
for m in mol1:
fi = Chem.MolFromSmiles(m)
fpgen2 = rdFingerprintGenerator.GetMorganGenerator(radius=2)
fps2 = [fpgen2.GetFingerprint(m) for m in fi]
for x in fsp2:
t = DataStructs.TanimotoSimilarity(fps1, fps2(x))
print(t)
ERROR:
fps1 = [fpgen1.GetFingerprint(m) for m in fii]
TypeError: 'Mol' object is not iterable
The Mol object is the name of the rdkit class that is returned when you call Chem.MolFromSmiles, not one of your variable names.
The error says that the Mol object is not iterable (it is a single molecule)
from rdkit import Chem
from rdkit.Chem import rdFingerprintGenerator
from rdkit.Chem import DataStructs
smiles1 = ('CCO', 'CCOO')
smiles2 = ('CC',)
mols1 = [Chem.MolFromSmiles(smi) for smi in smiles1]
mols2 = [Chem.MolFromSmiles(smi) for smi in smiles2]
# you only need to instantiate the generator once, you can use it for both lists
fpgen = rdFingerprintGenerator.GetMorganGenerator(radius=2)
fps1 = [fpgen.GetFingerprint(m) for m in mols1]
fps2 = [fpgen.GetFingerprint(m) for m in mols2]
# if you only care about the single entry in fps2 you can just index it
for n, fp in enumerate(fps1):
t = DataStructs.TanimotoSimilarity(fp, fps2[0])
print(n, t)

TypeError: 'module' object is not callable in python and pandas

I imported the module correctly into pandas and called it correctly using import main and then main.main(data, 1,10,2.5) but I am getting an error:
TypeError Traceback (most recent call last)
<ipython-input-52-e9913b227737> in <module>()
----> 1 main.main(data, 1, 10, 2.5)
38 dat_sh = data.shape[0]
39 #Z = random.sample(range(0,U),k_max)
---> 40 Z = cf.centroid_finder(data,sp_atr,k_max)
41
42 prototypes = {i: data[j:j+1].values.tolist()[0] for i,j in enumerate(Z)}
11 for i in range(dat_sh):
12 for j in range(dat_sh):
---> 13 D[i][j] = ed(sub_atr[i],sub_atr[j])
14
15
TypeError: 'module' object is not callable
ed is euclidean:
def ed(X2, X1):
return sqrt(sum(np.subtract(X1,X2)**2))
There may be some confusion regarding the importing of modules vs functions.
It's possible to recreate your error with a simple example, assuming that module/function importing has gotten mixed up somewhere along the way. Consider a case where we pass initial values a and b through a centroid_finder() function, which lives in cf.py:
# import cf.py as cf
import cf
a, b = ([1,2,3], [2,3,4])
cf.centroid_finder(a, b)
But centroid_finder() calls ed(), which lives in ed.py:
## cf.py
# import ed.py as ed
import ed
def centroid_finder(a, b):
print(ed(a, b))
## ed.py
from numpy import sqrt, sum
import numpy as np
def ed(X2, X1):
return sqrt(sum(np.subtract(X1,X2)**2))
Here, calling centroid_finder() will give the error you observed:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-e76ec6a0496c> in <module>()
3 a, b = ([1,2,3], [2,3,4])
4
----> 5 cf.centroid_finder(a, b)
cf.py in centroid_finder(a, b)
2
3 def centroid_finder(a, b):
----> 4 print(ed(a, b))
TypeError: 'module' object is not callable
That's because you imported a module, ed.py as ed...but what you wanted was to call the ed() function that lives inside of ed.py. That's ed.ed()!
Changing centroid_finder() to call ed.ed() produces the desired result:
# cf.py
import ed
def centroid_finder(a, b):
print(ed.ed(a, b))
Now, from the main script:
cf.centroid_finder(a, b)
# 1.73205080757
There's at least one undisclosed import shorthand in your example code, where you call sqrt in ed(). There isn't a natural sqrt() in Python, it most likely is imported from either math or numpy, e.g. from numpy import sqrt. That's not a problem, per se, but given the error you're getting about modules being non-callable, you might benefit from explicitly calling functions in your code from the modules they live in.
For example, using import numpy as np, call np.sqrt() instead of just importing sqrt() directly. This is a defensive programming posture that will prevent similar confusion in the future. (You do this already with np.subtract() in ed(); it's unclear why sqrt() doesn't get the same treatment.)

Python: cannot import name x for importing module

** EDIT: Copy-pasting my actual file to ease confusion. The code snippet below is in a file named train_fm.py:
def eval_fm(x,b,w,V):
# evaluate a degree 2 FM. x is p X B
# V is p x k
# some python code that computes yhat
return(yhat);
Now in my main file: I say the following
from train_fm import eval_fm
and I get the error:
ImportError: cannot import name f1
When I type
from train_fm import train_fm
I do not get an error.
OLD QUESTION BELOW :
def train_fm(x,y,lb,lw,lv,k,a,b,w,V):
# some code
yhat = eval_fm(x,b,w,V);
# OUTPUTS
return(b,w,V);
I have a file called f2.py, where I define 2 functions (note that one of the functions has the same name as the file)
def f1():
some stuff;
return(stuff)
def f2():
more stuff;
y = f1();
return(y)
In my main file, I do
from aaa import f1
from aaa import f2
but when I run the first of the 2 commands above, I get
ImportError: cannot import name f1
Any idea what is causing this? The second function gets imported fine.

What object to pass to R from rpy2?

I'm unable to make the following code work, though I don't see this error working strictly in R.
from rpy2.robjects.packages import importr
from rpy2 import robjects
import numpy as np
forecast = importr('forecast')
ts = robjects.r['ts']
y = np.random.randn(50)
X = np.random.randn(50)
y = ts(robjects.FloatVector(y), start=robjects.IntVector((2004, 1)), frequency=12)
X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
It's especially confusing considering the following code works fine
forecast.auto_arima(y, xreg=X)
I see the following traceback no matter what I give for X, using numpy interface or not. Any ideas?
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-20-b781220efb93> in <module>()
13 X = ts(robjects.FloatVector(X), start=robjects.IntVector((2004, 1)), frequency=12)
14
---> 15 forecast.Arima(y, xreg=X, order=robjects.IntVector((1, 0, 0)))
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
84 v = kwargs.pop(k)
85 kwargs[r_k] = v
---> 86 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
/home/skipper/.local/lib/python2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
33 for k, v in kwargs.iteritems():
34 new_kwargs[k] = conversion.py2ri(v)
---> 35 res = super(Function, self).__call__(*new_args, **new_kwargs)
36 res = conversion.ri2py(res)
37 return res
RRuntimeError: Error in `colnames<-`(`*tmp*`, value = if (ncol(xreg) == 1) nmxreg else paste(nmxreg, :
length of 'dimnames' [2] not equal to array extent
Edit:
The problem is that the following lines of code do not evaluate to a column name, which seems to be the expectation on the R side.
sub = robjects.r['substitute']
deparse = robjects.r['deparse']
deparse(sub(X))
I don't know well enough what the expectations of this code should be in R, but I can't find an RPy2 object that passes this check by returning something of length == 1. This really looks like a bug to me.
R> length(deparse(substitute((rep(.2, 1000)))))
[1] 1
But in Rpy2
[~/]
[94]: robjects.r.length(robjects.r.deparse(robjects.r.substitute(robjects.r('rep(.2, 1000)'))))
[94]:
<IntVector - Python:0x7ce1560 / R:0x80adc28>
[ 78]
This is one manifestation (see this other related issue for example) of the same underlying issue: R expressions are evaluated lazily and can be manipulated within R and this leads to idioms that do not translate well (in Python expression are evaluated immediately, and one has to move to the AST to manipulate code).
An answers to the second part of your question. In R, substitute(rep(.2, 1000)) is passing the unevaluated expression rep(.2, 1000) to substitute(). Doing in rpy2
substitute('rep(.2, 1000)')`
is passing a string; the R equivalent would be
substitute("rep(.2, 1000)")
The following is letting you get close to R's deparse(substitute()):
from rpy2.robjects.packages import importr
base = importr('base')
from rpy2 import rinterface
# expression
e = rinterface.parse('rep(.2, 1000)')
dse = base.deparse(base.substitute(e))
>>> len(dse)
1
>>> print(dse) # not identical to R
"expression(rep(0.2, 1000))"
Currently, one way to work about this is to bind R objects to R symbols
(preferably in a dedicated environment rather than in GlobalEnv), and use
the symbols in an R call written as a string:
from rpy2.robjects import Environment, reval
env = Environment()
for k,v in (('y', y), ('xreg', X), ('order', robjects.IntVector((1, 0, 0)))):
env[k] = v
# make an expression
expr = rinterface.parse("forecast.Arima(y, xreg=X, order=order)")
# evaluate in the environment
res = reval(expr, envir=env)
This is not something I am happy about as a solution, but I have never found the time to work on a better solution.
edit: With rpy2-2.4.0 it becomes possible to use R symbols and do the following:
RSymbol = robjects.rinterface.SexpSymbol
pairlist = (('x', RSymbol('y')),
('xreg', RSymbol('xreg')),
('order', RSymbol('order')))
res = forecast.Arima.rcall(pairlist,
env)
This is not yet the most intuitive interface. May be something using a context manager would be better.
there is a way to just simply pass your variables to R without sub-situations and return the results back to python. You can find a simple example here https://stackoverflow.com/a/55900840/5350311 . I guess it is more clear what you are passing to R and what you will get back in return, specially if you are working with For loops and large number of variables.

Categories