I'm running a python code where I'm importing the python module sPyRMSD. I'm getting an error when I reach the line containing the io.loadmol() for loading molecules.
from spyrmsd import io,rmsd
ref = io.loadmol("tempref.pdb")
I'm getting the following error-
Reference Molecule:lig_ref.pdb
PDBqt File:ligand_vina_out.pdbqt
Traceback (most recent call last):
File "rmsd.py", line 34, in <module>
ref = io.loadmol("tempref.pdb")
File "/home/aathiranair/.local/lib/python3.8/site-
packages/spyrmsd/io.py", line 66, in loadmol
mol = load(fname)
NameError: name 'load' is not defined
I tried uninstalling and reinstalling the spyrmsd module, but I still face the same issue.
I also tried creating a virtual environment and running the script but faced the same issue.
(ihub_proj) aathiranair#aathiranair-Inspiron-5406-
2n1:~/Desktop/Ihub$ python3 rmsd.py lig_ref.pdb
ligand_vina_out.pdbqt
Reference Molecule:lig_ref.pdb
PDBqt File:ligand_vina_out.pdbqt
Traceback (most recent call last):
File "rmsd.py", line 34, in <module>
ref = io.loadmol("tempref.pdb")
File
"/home/aathiranair/Desktop/Ihub/ihub_proj/lib/python3.8/site-
packages/spyrmsd/io.py", line 66, in loadmol
mol = load(fname)
NameError: name 'load' is not defined
the tempref.pdb file looks like this-
ATOM 1 O6 LIG 359 2.349 1.014 7.089 0.00 0.00
ATOM 9 H LIG 359 1.306 1.691 9.381 0.00 0.00
ATOM 2 C2 LIG 359 0.029 4.120 8.082 0.00 0.00
ATOM 3 O9 LIG 359 -1.106 2.491 9.345 0.00 0.00
ATOM 4 C1 LIG 359 -0.204 3.890 0.337 0.00 0.00
ATOM 5 S5 LIG 359 -0.355 4.108 4.075 0.00 0.00
ATOM 8 C4 LIG 359 -3.545 1.329 7.893 0.00 0.00
ATOM 7 C7 LIG 359 -1.133 5.150 9.406 0.00 0.00
ATOM 6 C3 LIG 359 -0.064 1.805 8.234 0.00 0.00
It seems that to use the module io one of OpenBabel or RDKit is required
Also, make sure to have NumPy
Related
I have a PDB file which contain two molecules (receptor and ligand).
Each molecule will have its own header. All in ONE PDB file.
The header of receptor section looks like this (line 1-6 of the PDB file):
HEADER rec.pdb
REMARK original generated coordinate pdb file
ATOM 1 N GLY A 1 -51.221 -13.970 37.091 1.00 0.00 RA0 N
ATOM 2 H GLY A 1 -50.383 -13.584 37.482 1.00 0.00 RA0 H
ATOM 3 CA GLY A 1 -50.902 -15.071 36.197 1.00 0.00 RA0 C
ATOM 4 C GLY A 1 -49.525 -15.659 36.443 1.00 0.00 RA0 C
and this ligand section (line 11435 to 11440) of the PDB file
HEADER lig.000.00.pdb
ATOM 1 N MET A 1 27.318 -26.957 12.663 1.00 0.00 LA0 N
ATOM 2 H MET A 1 27.313 -27.570 11.870 1.00 0.00 LA0 H
ATOM 3 CA MET A 1 28.374 -27.102 13.668 1.00 0.00 LA0 C
ATOM 4 CB MET A 1 28.531 -28.564 14.090 1.00 0.00 LA0 C
ATOM 5 CG MET A 1 27.224 -29.154 14.628 1.00 0.00 LA0 C
Note that the receptor and ligand section also contain the string RA0 and LA0
at the 11th column of PDB file.
What I want to do is to rename the chain of receptor as chain A and ligand as chain B.
In order to do that I intended to extract both part into two different objects first,
then rename the chain and finally put them together again.
I tried this code with Bio3D.
But doesn't work:
library(bio3d)
pdb_infile <- "myfile.pdb"
pdb <- read.pdb(pdb_infile)
receptor_segment.sele <- atom.select(pdb, segid = "RA0", verbose = TRUE)
receptor_pdb <- trim.pdb(pdb, receptor_segment.sele)
ligand_segment.sele <- atom.select(pdb, segid = "LA0", verbose = TRUE)
ligand_pdb <- trim.pdb(pdb, ligand_segment.sele) # showed no entry
What's the way to do it?
I'm open to solution with R or Python.
I have a txt file with elements:
705.95 117.81 1242.00 252.43 5.02
1036.12 183.52 1242.00 375.00 1.96
124.11 143.43 296.91 230.32 10.70
0.00 0.00 0.00 0.00 4.84
0.00 6.60 112.99 375.00 17.50
0.00 186.66 14.82 375.00 8.23
695.36 162.75 820.66 263.08 12.84
167.61 134.45 417.75 222.10 27.61
0.00 0.00 0.00 0.00 6.86
0.00 0.00 0.00 0.00 11.76
I want to delete lines that contains 0.00 0.00 0.00 0.00 as the first four elements of each line, how can I do that using python? Your help is highly appreciated.
with open('file.txt', 'r') as infile:
with open('output.txt', 'w') as outfile:
for line in infile:
if not line.startswith('0.00 0.00 0.00 0.00'):
outfile.write(line)
Here we open file.txt with your lines for reading and output.txt for writing the result. Then, we iterate over each line of the input file and write the line in the results file if it doesn't start with '0.00 0.00 0.00 0.00'.
If you want to overwrite the files without creating new output files, you can try this. The following code also helps you iterate through all the text files in the current directory.
import glob
for i in glob.glob("*.txt"):
with open(i, "r+") as f:
content = f.readlines()
f.truncate(0)
f.seek(0)
for line in content:
if not line.startswith("0.00 0.00 0.00 0.00"):
f.write(line)
I am stuck trying to query nearest neighbors of models from a pdb file, using scipy’s kd-tree. I have currently implemented a brute force approach where I compare each model's rmsd value to every other model. I would like to speed up the time it takes to find each model nearest neighbors by using kd-tree.
For reference, a sample of the pdb file I am working with has multiple models in a single file:
MODEL 5
HETATM 1 C1 SIN A 0 13.542 -2.290 0.745 1.00 0.00 C
HETATM 2 O1 SIN A 0 14.446 -2.652 0.010 1.00 0.00 O
HETATM 3 O2 SIN A 0 12.378 -2.189 0.395 1.00 0.00 O
...
TER 627 NH2 A 39
ENDMDL
MODEL 6
HETATM 1 C1 SIN A 0 11.762 2.281 -7.835 1.00 0.00 C
ATOM 26 C TRP A 2 11.341 6.316 -0.847 1.00 0.00 C
ATOM 27 O TRP A 2 11.074 6.179 0.330 1.00 0.00 O
ATOM 28 CB TRP A 2 13.182 7.844 -1.538 1.00 0.00 C
ATOM 29 CG TRP A 2 12.069 8.524 -2.259 1.00 0.00 C
...
HETATM 626 HN2 NH2 A 39 3.093 9.404 -6.782 1.00 0.00 H
TER 627 NH2 A 39
ENDMDL
MODEL 7
HETATM 1 C1 SIN A 0 -16.074 -1.515 -4.262 1.00 0.00 C
HETATM 2 O1 SIN A 0 -16.968 -1.910 -4.992 1.00 0.00 O
...
ATOM 18 OD1 ASP A 1 -12.877 3.426 -8.525 1.00 0.00 O
ATOM 19 OD2 ASP A 1 -13.484 1.785 -9.782 1.00 0.00 O
TER 627 NH2 A 39
ENDMDL
My initial attempt was to represent each model as a list, that has a list of atom coordinates, and each 3D atom coordinate is represented by a list:
print(model_coord)
[
[[1.4579, 0.0, 0.0],... ,[-5.5, 21.5529, 23.7390]],
[[16.5450, 3.3699, 10.1888], ... ,[-0.0963, 24.510883331298828, 20.2952]],
[[17.6256, 2.5858, 12.4808],... ,[-11.6052, 13.1031, 23.8958]]
]
I then received the following error when creating kdtree object:
kdtree = scipy.spatial.KDTree(model_coord)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 235, in __init__
self.n, self.m = np.shape(self.data)
ValueError: too many values to unpack
However, converting model_coord into panada dataframes allowed me to obtain the n by m requirement to create the kdtree object, where each row represents a model and the column 3D atom coordinates:
model_df = pd.DataFrame(model_coord)
print(model_df.to_string())
0 1 2 ...
0 [1.45799, 0.0, 0.0] [3.9140, 2.8670, 0.4530] [7.590, 3.7990, 0.1850] ...
1 [16.5450, 3.3699, 10.1888] [15.9148, 1.9402, 13.6552] [14.4702, 2.6485, 17.0995] ...
2 [17.6256, 2.5858, 12.4808] [16.4266, 2.2781, 16.0749] [12.6480, 2.6846, 16.0066] …
Here is my attempt to query nearest neighbor of radius with a model, where epsilon is the radius:
kdtree = scipy.spatial.KDTree(model_df)
for index, model in model_df.iterrows():
model_nn_dist, model_nn_ids = kdtree.query(model,distance_upper_bound=epsilon)
Received the following error due to the coordinates being a list object:
model_nn_dist, model_nn_ids=kdtree.query(model,distance_upper_bound=epsilon)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 521, in query
hits = self.__query(x, k=k, eps=eps, p=p,distance_upper_bound=distance_upper_bound)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 320, in __query
side_distances = np.maximum(0,np.maximum(x-self.maxes,self.mins-x))
TypeError: unsupported operand type(s) for -: 'list' and ‘list'
Attempted to resolve this by converting the atom coordinates into numpy array, however, this is the error I receive:
model_nn_dist, model_nn_ids = kdtree.query(model,distance_upper_bound=epsilon)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 521, in query
hits = self.__query(x, k=k, eps=eps, p=p, distance_upper_bound=distance_upper_bound)
File "/Library/Python/2.7/site-packages/scipy/spatial/kdtree.py", line 320, in __query
side_distances = np.maximum(0,np.maximum(x-self.maxes,self.mins-x))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I am wondering if there is a better approach or a more suitable data structure to query nearest neighbors of models or sets of coordinates, using kd-trees.
While adding to the list, I found the following error
I have to use the label word.
However, the following error is displayed.
What should I do?
Python code :
asd = 1
def data():
if 1 == asd:
f = open('sar2.txt','r')
a=[]
for i in f:
a.append(i.split())
for c in a:
c.pop(0)
c.pop(1)
a = [[i[0],float(i[1]),float(i[2]),float(i[3]),float(i[4]),float(i[5])] for i in a]
b = [{type: 'date', label: 'Season Start Date'},'user','nice','system','iowait','idle']
a.insert(0,b)
f.close()
return a
data()
sar2.txt :
2017/06/29 00:01:01 all 0.24 0.00 0.16 0.27 99.33
2017/06/29 00:02:01 all 0.13 0.00 0.04 0.13 99.70
2017/06/29 00:03:01 all 1.05 0.00 0.38 0.26 98.30
2017/06/29 00:04:01 all 0.44 0.00 0.10 0.15 99.32
2017/06/29 00:05:01 all 0.25 0.00 0.08 0.22 99.45
Error :
b = [{type: 'date', label: 'Season Start Date'},'user','nice','system','iowait','idle']
NameError: global name 'label' is not defined
How can I fix the error?
I guess you want to use strings not variables as keys in your dict, so your code should look like this
b = [{'type': 'date', 'label': 'Season Start Date'},'user','nice','system','iowait','idle']
You also defined type as a key for your dict, it won't give you an error because these is a built-in type() function but you should never use it like a key for your dictionary
Reducing to the minimal error:
>>> {asdf: 3}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'asdf' is not defined
>>> {'asdf': 3}
{'asdf': 3}
When creating the dictionary, you probably intended to have strings for keys.
I have a simple data file to plot.
Here is the contents of a data file and I named it "ttry":
0.27 0
0.28 0
0.29 0
0.3 0
0.31 0
0.32 0
0.33 0
0.34 0
0.35 0
0.36 0
0.37 0
0.38 0.00728737997257
0.39 0.0600137174211
0.4 0.11488340192
0.41 0.157321673525
0.42 0.193158436214
0.43 0.233882030178
0.44 0.273319615912
0.45 0.311556927298
0.46 0.349879972565
0.47 0.387602880658
0.48 0.424211248285
0.49 0.460390946502
0.5 0.494855967078
0.51 0.529406721536
0.52 0.561814128944
0.53 0.594307270233
0.54 0.624228395062
0.55 0.654492455418
0.56 0.683984910837
0.57 0.711762688615
0.58 0.739368998628
0.59 0.765775034294
0.6 0.790895061728
0.61 0.815586419753
0.62 0.840192043896
0.63 0.863082990398
0.64 0.886231138546
0.65 0.906292866941
0.66 0.915809327846
0.67 0.911436899863
0.68 0.908179012346
0.69 0.904749657064
0.7 0.899519890261
0.71 0.895147462277
0.72 0.891632373114
0.73 0.888803155007
0.74 0.884687928669
0.75 0.879029492455
0.76 0.876114540466
0.77 0.872170781893
0.78 0.867541152263
0.79 0.86274005487
0.8 0.858367626886
0.81 0.854080932785
0.82 0.850994513032
0.83 0.997170781893
0.84 1.13477366255
0.85 1.24296982167
0.86 1.32690329218
0.87 1.40397805213
0.88 1.46836419753
0.89 1.52306241427
0.9 1.53232167353
0.91 1.52906378601
0.92 1.52211934156
0.93 1.516718107
0.94 1.51543209877
0.95 1.50660150892
0.96 1.50137174211
0.97 1.49408436214
0.98 1.48816872428
0.99 1.48088134431
1 1.4723079561
And then I use matplotlib.pyplot.plotfile to plot it. Here is my python script
from matplotlib import pyplot
pyplot.plotfile("ttry", cols=(0,1), delimiter=" ")
pyplot.show()
However the following error appears:
C:\WINDOWS\system32\cmd.exe /c ttry.py
Traceback (most recent call last):
File "E:\research\ttry.py", line 2, in <module>
pyplot.plotfile("ttry",col=(0,1),delimiter=" ")
File "C:\Python33\lib\site-packages\matplotlib\pyplot.py", line 2311, in plotfile
checkrows=checkrows, delimiter=delimiter, names=names)
File "C:\Python33\lib\site-packages\matplotlib\mlab.py", line 2163, in csv2rec
rows.append([func(name, val) for func, name, val in zip(converters, names, row)])
File "C:\Python33\lib\site-packages\matplotlib\mlab.py", line 2163, in <listcomp>
rows.append([func(name, val) for func, name, val in zip(converters, names, row)])
File "C:\Python33\lib\site-packages\matplotlib\mlab.py", line 2031, in newfunc
return func(val)
ValueError: invalid literal for int() with base 10: '0.00728737997257'
shell returned 1
Hit any key to close this window...
Obviously, python just considers yaxis data as int. So how to tell python I use float for yaxis data?
It implies int type of your second column based on first few values, which are all int's. To make it check all rows, add checkrows = 0 to arguments, that is:
pyplot.plotfile("ttry", cols=(0,1), delimiter=" ", checkrows = 0)
It's an argument coming from matplotlib.mlab.csv2rec, see more info here.