Python package: Bioservices, error using UniChem() command - python

I was following the tutorial on the webpage:
http://pythonhosted.org/bioservices/compound_tutorial.html
Everything worked well until I reached the following command:
uni = UniChem()
and then I received the error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "P:\Anaconda\lib\site-packages\bioservices\unichem.py", line 84, in __init__
maxid_service = int(self.get_all_src_ids()[-1]['src_id'])
TypeError: list indices must be integers, not str
As a minimum working example:
from bioservices import *
uni = UniChem()
and then I receive the error. I understand the error (for the most part) but I don't know how to fix it. So my question is how do I fix the function or work around it?
The overall aim it to map a list of 1000 drug names (and hopefully more in the near future) to Chembl IDs.

The error you saw is probably related to the fact that when you tried to connect to UniChem service, it was off for maintenance or it took too much time to initialize. The consequence is that the service was not started hence the error message you got.
I've just tried (bioservices 1.2.6)
from bioservices import *
uni = UniChem()
and it worked. The following request also worked:
>>> mapping = uni.get_mapping("kegg_ligand", "chembl")
'CHEMBL278315'

Related

Using Python, NLTK, to analyse German text

I am a beginner in Python and currently trying to use NLTK to analyze German text (extract the German noun and it's frequency of German text) by following this tutorial: https://datascience.blog.wzb.eu/2016/07/13/accurate-part-of-speech-tagging-of-german-texts-with-nltk/
There are several issues that I faced during the process and I am not able to solve them.
When I follow the website to execute the code below:
import random
tagged_sents = list(corp.tagged_sents())
random.shuffle(tagged_sents)
split_perc = 0.1
split_size = int(len(tagged_sents) * split_perc)
train_sents, test_sents = tagged_sents[split_size:], tagged_sents[:split_size]
and it comes out with this
Traceback (most recent call last):
File "test2.py", line 7, in <module>
tagged_sents = list(corp.tagged_sents())
File "C:\Users\User\anaconda3\lib\site-packages\nltk\corpus\reader\conll.py", line 130, in tagged_sents
return LazyMap(get_tagged_words, self._grids(fileids))
File "C:\Users\User\anaconda3\lib\site-packages\nltk\corpus\reader\conll.py", line 215, in _grids
return concat(
File "C:\Users\User\anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 433, in concat
raise ValueError("concat() expects at least one object!")
ValueError: concat() expects at least one object!
Then I try to fix by following this solution https://teamtreehouse.com/community/randomshuffle-crashes-when-passed-a-range-somenums-randomshufflerange5250
and alter the
tagged_sents = list(corp.tagged_sents())
to
tagged_sents = list(range(5,250))
And the ValueError didn't come out, I don't know what (5,250) means, although I have read the explanation.
Then I continue to execute the follow step
from ClassifierBasedGermanTagger.ClassifierBasedGermanTagger import ClassifierBasedGermanTagger
tagger = ClassifierBasedGermanTagger(train=train_sents)
And it shows
Traceback (most recent call last):
File "test1.py", line 90, in <module>
from ClassifierBasedGermanTagger.ClassifierBasedGermanTagger import ClassifierBasedGermanTagger
ModuleNotFoundError: No module named 'ClassifierBasedGermanTagger'
I have already downloaded the ClassifierBasedGermanTagger.py and init.py and put them in the folder which link to the VS CODE, don't know if it is correct as the passage said:
'Using his Python class ClassifierBasedGermanTagger (which you can download from the github page) we can create a tagger and train it with the data from the TIGER corpus:'
Please help me to fix these problems, thanks!
First of all, welcome to StackOverflow! Before posting a question, please make sure that you have done your own research and most of the time it solves the problem.
Secondly, range(start, end) is a very basic function in Python to get list of numbers based on the input and I don't think using it like the way you did is going to solve the problem. I would suggest you to use print to see what kind of data is being populated in corp and start debugging from there. Maybe corp is just empty and that's why you don't get any tagged_sents.
For the the import part, it is not clear to me where did you put the ClassifierBasedGermanTagger.py but wherever it is, it is not visible to your code. You can try to put your code (test2.py) and ClassifierBasedGermanTagger.py in the same directory. Read the link below for more details on how to properly import module in Python.
https://docs.python.org/3/reference/import.html

What's causing this np.array data type error, and what's the fix?

I'm working on an image output project-- I can't figure out what the problem specific to this line is: Traceback (most recent call last):
input_array[i].append(np.array(Image.fromarray(img_input).resize(float(g_scale), resample=Image.BICUBIC)))
TypeError: Cannot handle this data type
The "cannot handle this data type" was manually written in case an error like this happened. I searched up multiple possibilities as to the problem with the Image.fromarray line and couldn't narrow it down to this specific line's needs. Could appreciate any help!
Here's a more full view of the for loop being used, it's essentially testing the network:
for i, gscale in enumerate(gscales):
if float(g_scale) == 1:
input_array[i].append(img_input)
else:
input_array[i].append(np.array(Image.fromarray(img_input).resize(float(g_scale), resample=Image.BICUBIC)))
output_array[i].append(eval_model.predict_on_batch(input_array[i][-1]))
Depending on your editor, it might be because you used g_scale rather than gscale inside the for loop. It could have skipped the variable name error and instead pick up on the error from the line after else.

Python-Weka-Wrapper3 removing attributes from arff file error

I have an arff file and I need to remove the first 5 attributes from it (without manually deleting them). I tried to use the Python-Weka-Wrapper3 as it is explained here which enables the filtering options of Weka, however I get an error while using the following code:
import weka.filters as Filter
remove = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "1,2,3,4,5"])
The error that I receive is the following:
Traceback (most recent call last):
File "/home/user/Desktop/file_loading.py", line 16, in <module>
removing = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "last"])
TypeError: 'module' object is not callable
What could be the reason for this error? Also I would appreciate if anyone knows an alternative way to remove attributes from an arff file using Python.
You are attempting to call the module object instead of the class object.
Try using:
from weka.filters import Filter
remove = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "1,2,3,4,5"])

OSError: [Errno 22] Invalid argument Python File Processing

i am currently undergoing my A2 studies in Computer Science and i am having difficulties with random access file processing.
I am trying to have a list UsersArraywhich stores some record data types UsersArray = [lion,soso,Sxia] and loop through the list and store each record in the File TEST.DAT at a specific offset calculated like this Address = hash(UsersArray[i].Password). The problem occurs when i try to do File.seek(Address). It gives me an error and tells me the argument in seek() function is not correct, and i don't understand why this error occurs.
import Users,pickle
File = open("TEST.DAT","rb+")
lion = Users.Users()
lion.Password = "ilovefood"
soso = Users.Users()
soso.Password = "cats123"
Sxia = Users.Users()
Sxia.Password = "luca<3"
UsersArray = [lion,soso,Sxia]
for i in range(3):
Address = hash(UsersArray[i].Password)
File.seek(Address)
pickle.dump(UsersArray[i],File)
File.close()
Error Message:
Traceback (most recent call last):
File "C:\Users\Vaio\Desktop\PythonA2\File Processing\RandomAccessWrite.py", line 17, in <module>
File.seek(Address)
OSError: [Errno 22] Invalid argument
[Finished in 0.1s with exit code 1]
[shell_cmd: python -u "C:\Users\Vaio\Desktop\PythonA2\File Processing\RandomAccessWrite.py"]
[dir: C:\Users\Vaio\Desktop\PythonA2\File Processing]
[path: C:\MinGW\bin;C:\Users\Vaio\AppData\Local\Programs\Python\Python36-32\Scripts\;C:\Users\Vaio\AppData\Local\Programs\Python\Python36-32\]
Thank you for the help in advance!
I am inclined to believe that jasonharper nailed the issue. I replicated your code using my own user objects and commented out the pickle.dump() line. I was able to print both the user with the corresponding hash value without any issues. Then I uncommented pickle.dump() and used my own (small) iterative value to use in File.seek(); when I did this, everything worked fine and python wrote to the file. I think the hash values that you're calculating are too large to be written to the file. Not sure if it's part of your assignment or not, but those hash values won't work as file offsets.

Cannot perform operations using rpy2 in Python: "TypeError: argument 1 must be a str, not int"

So I'm trying to get to grips with using the rpy2 module (I am familiar with R but new to Python). Following this tutorial, I first load the library and assign it to the variable 'r' using:
import rpy2
import rpy2.robjects as robjects
r = robjects.r
then I try to perform a simple operation to confirm everything is working:
print(r[2+2])
but I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\site-packages\rpy2\robjects\__init__.py", line 248, in _
_getitem__
res = _globalenv.get(item)
TypeError: argument 1 must be str, not int
I'm sure it's just something stupid I'm doing wrong, but any advice would be much appreciated. I'm using python3.4.2 (64bit), rpy2-2.5.6 (64bit) on a Windows 7 machine (64bit).
You should use print(r(2+2)) instead of print(r[2+2]).
When you use r[2+2] you are trying to recover an element corresponding to the index 4 (the result of 2+2) of the r iterable. And your r object doesn't seem to respond to this kind of message.
Ok I think I have figured it out. For R to evaluate the function inside the parenthesis, the function must be in quotes e.g.
r("2+2")
This is what was confusing me because this looks like I'm providing a string.
Oddly I don't print the result (4) by using:
print(r("2+2"))
as this prints:
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
print(r("2+2"))
File "C:\Python34\lib\site-packages\rpy2\robjects\robject.py", line 49, in __str__
s = str.join(os.linesep, s)
TypeError: sequence item 0: expected str instance, bytes found
Instead I just print the result using:
answer = r("2+2")
answer[0]
(Because R is vector based, the initial value of the vector is the answer so you have to index it at the first position, otherwise you get:
answer = r("2+2")
answer
<FloatVector - Python:0x0000000005836EC8 / R:0x00000000047A51A0>
[4.000000]
Thanks for you help
Hefin

Categories