Single molecule SMIRKS¶
The ChemPerGraph
objects are intended to create SMIRKS from molecules based on specified atoms. The easiest way to do this is to provide ChemPerGraphFromMol
a molecule and dictionary of key atoms.
These objects were created largely as a precursor to ClusterGraph
. However, it is possible some people will find them useful as a standalone object. To the best of my knowledge, there was not previously a tool to create a SMIRKS pattern for a whole molecule. RDKit
has a way to write molecules as “SMARTS” but as far as I can tell it just writes a SMILES string with square brackets around the atoms.
Below are a variety of examples for ChemPerGraphs
for a variety of molecules.
[ ]:
from chemper.mol_toolkits import mol_toolkit
from chemper.graphs.fragment_graph import ChemPerGraphFromMol
Simple SMIRKS¶
The most simple SMIRKS patterns have only the indexed atoms.
ChemPerGraphFromMol
objects are initiated with a chemper molecule and a dictioanry storing atom indices by desired SMIRKS index:
[2]:
mol = mol_toolkit.MolFromSmiles('C1CCC1')
# store atom 1 in smirks index 1 and atom 4 in smirks index 2
smirks_atoms = (0, 4)
graph = ChemPerGraphFromMol(mol, smirks_atoms)
graph.as_smirks()
[2]:
'[#6AH2X4x2r4+0:1]-!@[#1AH0X1x0r0+0:2]'
Extend away from the indexed atoms¶
You can also extend away from the indexed atoms using the optional layers argument. If layers is greater than 0 then atoms up to that many bonds away from the indexed atoms are added to the graph.
Lets start with just 1 layer away
[3]:
graph = ChemPerGraphFromMol(mol, smirks_atoms, layers=1)
graph.as_smirks()
[3]:
'[#6AH2X4x2r4+0:1](-!@[#1AH0X1x0r0+0:2])(-!@[#1AH0X1x0r0+0])(-@[#6AH2X4x2r4+0])-@[#6AH2X4x2r4+0]'
Encode the whole molecule¶
The other option for layers is ‘all’ which will continue adding atoms until there are no more atoms in the molecule. These SMIRKS become really unreadable for humans, but do encode all information about the molecule.
[4]:
graph = ChemPerGraphFromMol(mol, smirks_atoms, layers='all')
graph.as_smirks()
[4]:
'[#6AH2X4x2r4+0:1](-!@[#1AH0X1x0r0+0:2])(-!@[#1AH0X1x0r0+0])-@[#6AH2X4x2r4+0](-!@[#1AH0X1x0r0+0])(-!@[#1AH0X1x0r0+0])-@[#6AH2X4x2r4+0](-!@[#1AH0X1x0r0+0])(-@[#6AH2X4x2r4+0](-!@[#1AH0X1x0r0+0])-!@[#1AH0X1x0r0+0])-!@[#1AH0X1x0r0+0]'
[ ]: