Sanitization error applying reaction to molecule RDKit

Sanitization error applying reaction to molecule RDKit - python

Sanitization error while applying a reaction to an molecule with wedged bond.
I am getting this error while applying a proton removal reaction to a molecule but I do not see any error in MolBlock information.
This is for a reaction problem in which I am trying to apply a simple reaction (proton removal) to a molecule given its isomeric SMILES.
I create a function to apply reaction using SMARTS and SMILES but I am getting the following error which I could not fixed.
I am using the following code to load my inputs.
smile = rdkit.Chem.rdmolfiles.MolToSmiles(mol,isomericSmiles=True)
which leads to:
C/C1=C\\C[C##H]([C+](C)C)CC/C(C)=C/CC1
I create the following dictionary to use my SMILES and SMARTS:
reaction_smarts = {}
# proton removal reaction
reaction_smarts["proton_removal"] = "[Ch:1]-[C+1:2]>>[C:1]=[C+0:2].[H+]"
reactions = {name: AllChem.ReactionFromSmarts(reaction_smarts[name]) for name in reaction_smarts}
# function to run reactions
def run_reaction(molecule, reaction):
products = []
for product in reaction.RunReactant(molecule, 0):
Chem.SanitizeMol(product[0])
products.append(product[0])
return products
# apply reaction
products = run_reaction(cation_to_rdkit_mol["mol_name"], reactions["proton_removal"])
At this step I am getting this error but I cannot fix it.
RDKit ERROR: [10:43:23] Explicit valence for atom # 0 C, 5, is greater than permitted
Expected results should be the the molecule with the double bond and its stereoisomers:
First product: CC(C)=C1C/C=C(\\C)CC/C=C(\\C)CC1
Second product: C=C(C)[C##H]1C/C=C(\\C)CC/C=C(\\C)CC1
Third product: C=C(C)[C#H]1C/C=C(\\C)CC/C=C(\\C)CC1
I am using Chem.EnumerateStereoisomers.EnumerateStereoisomers() to get all stereoisomers but I am just getting the first and second product. I also added your initial proposal product[0].GetAtomWithIdx(0).SetNumExplicitHs(0) which actually fix the Explicit valence error. But now I am trying to figure it out how to get all that three stereoisomers.
Any hint why this is happening?, cause if I check the mol block with all the info about valence it seems to be fine.

The Error is stating that the Explicit valence for atom 0 (Carbon) is 5, this would suggest that the explicit hydrogen hasn't been removed although the bond is now a double bond, hence a valence of 5. I am not too familiar with reaction SMARTS although an easy way to fix this manually would be to set the number of explicit hydrogens on atom 0 to 0 before you sanitize:
product.GetAtomWithIdx(0).SetNumExplicitHs(0)
Chem.SanitizeMol(product)
Edit 1:
Scratch that, I did some experimentation, try this reaction:
rxn = AllChem.ReactionFromSmarts('[#6##H:1]-[#6+:2] >> [#6H0:1]=[#6+0:2]')
This way in the reaction definition we explicitly state that a hydrogen is lost and the resultant molecule will sanitize. Does this work for you?
Edit 2:
When I run this reaction the product does not seem to contain a cation:
mol = Chem.MolFromSmiles('C/C1=C\\C[C##H]([C+](C)C)CC/C(C)=C/CC1')
rxn = AllChem.ReactionFromSmarts('[#6##H:1]-[#6+:2] >> [#6H0:1]=[#6+0:2]')
products = list()
for product in rxn.RunReactant(mol, 0):
Chem.SanitizeMol(product[0])
products.append(product[0])
print(Chem.MolToSmiles(products[0]))
Output:
'CC(C)=C1C/C=C(\\C)CC/C=C(\\C)CC1'
Edit 3:
I think I now understand what you are looking for:
mol = Chem.MolFromSmiles('C/C1=C\\C[C##H]([C+](C)C)CC/C(C)=C/CC1')
# Reactant SMARTS
reactant_smarts = '[CH3:1][C+:2][C##H:3]'
# Product SMARTS
product_smarts = [
'[CH2:1]=[CH0+0:2][CH:3]',
'[CH2:1]=[CH0+0:2][C#H:3]',
'[CH2:1]=[CH0+0:2][C##H:3]',
]
# Reaction SMARTS
reaction_smarts = str(reactant_smarts + '>>' + '.'.join(product_smarts))
# RDKit Reaction
rxn = AllChem.ReactionFromSmarts(reaction_smarts)
# Get Products
results = list()
for products in rxn.RunReactant(mol, 0):
for product in products:
Chem.SanitizeMol(product)
results.append(product)
print(Chem.MolToSmiles(product))
Output:
'C=C(C)C1C/C=C(\\C)CC/C=C(\\C)CC1'
'C=C(C)[C#H]1C/C=C(\\C)CC/C=C(\\C)CC1'
'C=C(C)[C##H]1C/C=C(\\C)CC/C=C(\\C)CC1'
'C=C(C)C1C/C=C(\\C)CC/C=C(\\C)CC1'
'C=C(C)[C#H]1C/C=C(\\C)CC/C=C(\\C)CC1'
'C=C(C)[C##H]1C/C=C(\\C)CC/C=C(\\C)CC1'
Note that we get the same products twice, I think this is because the reactant SMARTS matches both CH3 groups, hence the reaction is applied to both. I hope this is what you are looking for.

Related

Why I can't get the shadow price from Gurobipy while the model is feasible?

I'm trying to solve peer to peer market optimization problem. Since the price of each trade between different participants can be different, I want to draw the price of each trade. And the problem should be a continuous model, because the trade amount can be at any level.
The model already works fine without problem, I can get the trade amount of each trade using like model.variables.trade[t,k,y].x.
But the problem is when I try to get the shadow price from the related constraints, it shows : GurobiError: Unable to retrieve attribute 'Pi'.
I can't find out how comes this error since the model is already feasible, and what I do is get an attribute which already exists.
The related part of my code shows as follows:
def _build_constraints(self):
m = self.model
P_nm3 = self.variables.P_nm3
agents = self.data.agents
timewindow = self.data.timewindow
windowinterval = self.data.windowinterval
budget = self.data.budget
P_n = self.variables.P_n
self.constraints.pnmmatch = {}
......
for t in self.data.interval:
for i in range(len(P_nm3[t])):
for j in range(len(P_nm3[t][0])):
self.constraints.pnmmatch[t,i,j] = m.addConstr(
P_nm3[t][i][j],
gb.GRB.EQUAL,
-P_nm3[t][j][i]
)
.......
poolmarket = marketclear()
poolmarket.optimize()
print(poolmarket.constraints.pnmmatch[0,1,6])
print(type(poolmarket.constraints.pnmmatch))
print(poolmarket.constraints.pnmmatch[0,1,6].pi)
shadow_price = poolmarket.model.getAttr(gb.GRB.Attr.Pi)
The result is:
<gurobi.Constr R1001>
<class 'dict'>
AttributeError: Unable to retrieve attribute 'pi'
both poolmarket.constraints.pnmmatch[0,1,6].pi or poolmarket.model.getAttr(gb.GRB.Attr.Pi) doesn't work.
If I comment on the last two lines about pi, the models works pretty fine.
But in a simpler similar model where the constraint is:
for t in self.data.interval:
for i in range(len(P_nm3[t])):
for j in range(len(P_nm3[t][0])):
self.constraints.pnmmatch[t] = m.addConstr(
P_nm3[t][i][j],
gb.GRB.EQUAL,
-P_nm3[t][j][i]
)
Then I'm able to draw the pi value with poolmarket.constraints.pnmmatch[0].pi command.
How can I solve this?
Thanks in advance!

Flux variability analysis only for transport reactions between compartments?

I would like to do a FVA only for selected reactions, in my case on transport reactions between compartments (e.g. between the cytosol and mitochondrion). I know that I can use selected_reactions in doFVA like this:
import cbmpy as cbm
mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')
cbm.doFVA(mod, selected_reactions=['R_FORtm', 'R_CO2tm'])
Is there a way to get the entire list of transport reactions, not only the two I manually added? I thought about
selecting the reactions based on their ending tm but that fails for 'R_ORNt3m' (and probably other reactions, too).
I want to share this model with others. What is the best way of storing the information in the SBML file?
Currently, I would store the information in the reaction annotation as in
this answer. For example
mod.getReaction('R_FORtm').setAnnotation('FVA', 'yes')
which could be parsed.

There is no built-in function for this kind of task. As you already mentioned, relying on the IDs is generally not a good idea as those can differ between different databases, models and groups (e.g. if someone decided just to enumerate reactions from r1 till rn and or metabolites from m1 till mm, filtering based on IDs fails). Instead, one can make use of the compartment field of the species. In CBMPy you can access a species' compartment by doing
import cbmpy as cbm
import pandas as pd
mod = cbm.CBRead.readSBML3FBC('iMM904.xml.gz')
mod.getSpecies('M_atp_c').getCompartmentId()
# will return 'c'
# run a FBA
cbm.doFBA(mod)
This can be used to find all fluxes between compartments as one can check for each reaction in which compartment their reagents are located. A possible implementation could look as follows:
def get_fluxes_associated_with_compartments(model_object, compartments, return_values=True):
# check whether provided compartment IDs are valid
if not isinstance(compartments, (list, set) or not set(compartments).issubset(model_object.getCompartmentIds())):
raise ValueError("Please provide valid compartment IDs as a list!")
else:
compartments = set(compartments)
# all reactions in the model
model_reactions = model_object.getReactionIds()
# check whether provided compartments are identical with the ones of the reagents of a reaction
return_reaction_ids = [ri for ri in model_reactions if compartments == set(si.getCompartmentId() for si in
model_object.getReaction(ri).getSpeciesObj())]
# return reaction along with its value
if return_values:
return {ri: model_object.getReaction(ri).getValue() for ri in return_reaction_ids}
# return only a list with reaction IDs
return return_reaction_ids
So you pass your model object and a list of compartments and then for each reaction it is checked whether there is at least one reagent located in the specified compartments.
In your case you would use it as follows:
# compartment IDs for mitochondria and cytosol
comps = ['c', 'm']
# you only want the reaction IDs; remove the ', return_values=False' part if you also want the corresponding values
trans_cyt_mit = get_fluxes_associated_with_compartments(mod, ['c', 'm'], return_values=False)
The list trans_cyt_mit will then contain all desired reaction IDs (also the two you specified in your question) which you can then pass to the doFVA function.
About the second part of your question. I highly recommend to store those reactions in a group rather than using annotation:
# create an empty group
mod.createGroup('group_trans_cyt_mit')
# get the group object so that we can manipulate it
cyt_mit = mod.getGroup('group_trans_cyt_mit')
# we can only add objects to a group so we get the reaction object for each transport reaction
reaction_objects = [mod.getReaction(ri) for ri in trans_cyt_mit]
# add all the reaction objects to the group
cyt_mit.addMember(reaction_objects)
When you now export the model, e.g. by using
cbm.CBWrite.writeSBML3FBCV2(mod, 'iMM904_with_groups.xml')
this group will be stored as well in SBML. If a colleague reads the SBML again, he/she can then easily run a FVA for the same reactions by accessing the group members which is far easier than parsing the annotation:
# do an FVA; fva_res: Reaction, Reduced Costs, Variability Min, Variability Max, abs(Max-Min), MinStatus, MaxStatus
fva_res, rea_names = cbm.doFVA(mod, selected_reactions=mod.getGroup('group_trans_cyt_mit').getMemberIDs())
fva_dict = dict(zip(rea_names, fva_res.tolist()))
# store results in a dataframe which makes the selection of reactions easier
fva_df = pd.DataFrame.from_dict(fva_dict, orient='index')
fva_df = fva_df.rename({0: "flux_value", 1: "reduced_cost_unscaled", 2: "variability_min", 3: "variability_max",
4: "abs_diff_var", 5: "min_status", 6: "max_status"}, axis='columns')
Now you can easily query the dataframe and find the flexible and not flexible reactions within your group:
# filter the reactions with flexibility
fva_flex = fva_df.query("abs_diff_var > 10 ** (-4)")
# filter the reactions that are not flexible
fva_not_flex = fva_df.query("abs_diff_var <= 10 ** (-4)")

Python find average of element in list with multiple elements

I have a ticker that grabs current information of multiple elements and adds it to a list in the format: trade_list.append([[trade_id, results]]).
Say we're tracking trade_id's 4555, 5555, 23232, the trade_list will keep ticking away adding their results to the list, I then want to find the averages of their results individually.
The code works as such:
Find accounts
for a in accounts:
find open trades of accounts
for t in range(len(trades)):
do some math
trades_list.append(trade_id,result)
avernum = 0
average = []
for r in range(len(trades_list)):
average.append(trades_list[r][1]) # This is the value attached to the trade_id
avernum+=1
results = float(sum(average)/avernum))
results_list.append([[trade_id,results]])
This fills out really quickly. This is after two ticks:
print(results_list)
[[[53471, 28.36432]], [[53477, 31.67835]], [[53474, 32.27664]], [[52232, 1908.30604]], [[52241, 350.4758]], [[53471, 28.36432]], [[53477, 31.67835]], [[53474, 32.27664]], [[52232, 1908.30604]], [[52241, 350.4758]]]
These averages will move and change very quickly. I want to use results_list to track and watch them, then compare previous averages to current ones
Thinking:
for r in range(len(results_list)):
if results_list[r][0] == trade_id:
restick.append(results_list[r][1])
resnum = len(restick)
if restick[resnum] > restick[resnum-1]:
do fancy things

Here is some short code that does what you I think you have described, although I might have misunderstood. You basically do exactly what you say; select everything that has a certain trade_id and returns its average.:
TID_INDEX = 0
DATA_INDEX = 1
def id_average(t_id, arr):
filt_arr = [i[DATA_INDEX] for i in arr if i[TID_INDEX] == t_id]
return sum(filt_arr)/len(filt_arr)

Linq, Python or Sql, need advice for TSS WSS BSS calculation

Hi i am making a staditical soft in c++ with QT.
I need to make many calculation over a table with the output of multivariate cluster analysis:
Var1,Var2,Var3,..VarN, k2,k3,k4...kn
where Var1 to n are the variables of study,
and k2 to kn the cluster clasification.
Table Example:
Var1,Var2,Var3,Var4,k2,k3,k4,k5,k6
3464.57,2992.33,2688.33,504.79,2,3,2,3,2
2895.32,3365.35,2824.35,504.86,1,2,3,2,6
2249.32,3300.19,2382.19,504.92,2,1,4,3,4
3417.81,3311.04,2426.04,504.97,1,2,2,5,2
3329.66,3497.14,2467.14,505.03,2,2,1,4,2
3087.85,3653.53,2296.53,505.09,2,1,2,3,4
The c++ storage will be defined like:
QList table;
Struct record
{
QList<double> vars;
QList<int> cluster;
}
I need to calculate the total, the within group and the between group square sum.
https://en.wikipedia.org/wiki/F-test
So by example to calculate WSS for Var1 and k2 need to:
in pseudo code:
get the size of every group:
count(*) group by(k2),
calculate the mean of every group:
sum(Var1) group by(k2), and then divide every one by the previous count.
compute the diference:
pow((xgroup1-xmeangroup1),2)
and many other operations....
Which alternatives will have more easy and powerfull codification:
1)Create a MySQL table on the fly and make SQL operations.
2)Use LINQ, but i does not if QT have QTLinq class.
3)Try to make trough Python Equivalents of LINQ Methods,
(how is the interaction between QT and Python, I see that Qgis have many plugin writed in Python)
Also in my app need to many make other calculus.
I hope to be clear.
Greetings

After some time I respond to my self,
the solution was maked in Python with Pandas.
This link is very ussefull:
Iterating through groups on: http://pandas.pydata.org/pandas-docs/stable/groupby.html
Also the book "Python for Data Analysis, West McKinney" pag 255
This video show how to make calculation:
ANOVA 2: Calculating SSW and SSB (total sum of squares within and between) | Khan Academy
https://www.youtube.com/watch?v=j9ZPMlVHJVs
[code]
def getDFrameFixed2D():
y = np.array([3,2,1,5,3,4,5,6,7])
k = np.array([1,1,1,2,2,2,3,3,3])
clusters = pd.DataFrame([[a,b] for a,b in zip(y,k)],columns=['Var1','K2'])
# print (clusters.head()) print("shape(0):",clusters.shape[0])
return clusters
X2D=getDFrameFixed2D()
MainMean = X2D['Var1'].mean(0)
print("Main mean:",MainMean)
grouped = X2D['Var1'].groupby(X2D['K2'])
print("-----Iterating Over Groups-------------")
Wss=0
Bss=0
for name, group in grouped:
#print(type(name))
#print(type(group))
print("Group key:",name)
groupmean = group.mean(0)
groupss = sum((group-groupmean)**2)
print(" groupmean:",groupmean)
print(" groupss:",groupss)
Wss+= groupss
Bss+= ((groupmean - MainMean)**2)*len(group)
print("----------------------------------")
print("Wss:",Wss)
print("Bss:",Bss)
print("T=B+W:",Bss+Wss)
Tss = np.sum((X-X.mean(0))**2)
print("Tss:",Tss)
print("----------------------------------")
[/code]
I am sure that could be do with aggregates(lambdas func) or apply.
But i don´t figure how
(if somebody know, please post here)
Greetings

How to modify SPSS output files with Python?

I'm making custom tables in SPSS, but when the cell values (percentages) are rounded to 1 decimal, they sometimes add up to 99,9 or 100,1 in stead of 100,0. My boss asked my to have everything neatly add up to 100. This means slightly changing some values in the output tables.
I wrote some code to retrieve cell values from tables, which works fine, but I cannot find any method or class that allows me to change cells in already generated output. I've tried things like :
Table[(rij,6)] = CellText.Number(11)
and
SpssDataCells[(rij,6)] = CellText.Number(11)
but it keeps giving me "AttributeError: 'SpssClient.SpssTextItem' object has no attribute 'DataCellArray'"
How do I succesfully change cell values of output tables in SPSS?
My code so far:
import SpssClient, spss
# Python verbinden met SPSS.
SpssClient.StartClient()
OutputDoc = SpssClient.GetDesignatedOutputDoc()
OutputItemList = OutputDoc.GetOutputItems()
# Laatste tabel pakken.
lastTab = OutputItemList.Size()-2
OutputItem = OutputItemList.GetItemAt(lastTab)
Table = OutputItem.GetSpecificType()
SpssDataCells = Table.DataCellArray()
# For loop. Voor iedere rij testen of de afgeronde waarden optellen tot 100.
# Specifieke getallen pakken.
rij=0
try:
while (rij<20):
b14 = float(SpssDataCells.GetUnformattedValueAt(rij,0))
z14 = float(SpssDataCells.GetUnformattedValueAt(rij,1))
zz14 = float(SpssDataCells.GetUnformattedValueAt(rij,2))
b15 = float(SpssDataCells.GetUnformattedValueAt(rij,4))
z15 = float(SpssDataCells.GetUnformattedValueAt(rij,5))
zz15 = float(SpssDataCells.GetUnformattedValueAt(rij,6))
print [b14,z14,zz14,b15,z15,zz15]
rij=rij+1
except:
print 'Einde tabel'

The SetValueAt method is what you require to change the value of a cell in a table.
PS. I think your boss should focus on more important things than to spend billable time on having percentages add up neatly to 100% (due to rounding). Also ensure you are using as many decimal point precision as possible in your calculations so to minimize this "discrepancy".
Update:
Just to give an example of what you can do with manipulation like this (beyond fixing round errors):
The table above shows the Share of Voice (SoV) of a Respiratory drug brand (R3) and it's rank among all brands (first two columns of data) and SoV & Rank also within it's same class of brands only (third and forth column). This is compared against previous month (July 15) and if the rank has increased then it is highlighted in green and an upward facing arrow is added and if declined in rank then highlighted in red and downward red facing arrow added. Just adds a little, color and visualization to what otherwise can be dull tables.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sanitization error applying reaction to molecule RDKit - python

Related

Why I can't get the shadow price from Gurobipy while the model is feasible?

Flux variability analysis only for transport reactions between compartments?

Python find average of element in list with multiple elements

Linq, Python or Sql, need advice for TSS WSS BSS calculation

How to modify SPSS output files with Python?

Categories

Resources