Single molecule SMIRKS

The ChemPerGraph objects are intended to create SMIRKS from molecules based on specified atoms. The easiest way to do this is to provide ChemPerGraphFromMol a molecule and dictionary of key atoms.

These objects were created largely as a precursor to ClusterGraph. However, it is possible some people will find them useful as a standalone object. To the best of my knowledge, there was not previously a tool to create a SMIRKS pattern for a whole molecule. RDKit has a way to write molecules as “SMARTS” but as far as I can tell it just writes a SMILES string with square brackets around the atoms.

Below are a variety of examples for ChemPerGraphs for a variety of molecules.

[ ]:
from chemper.mol_toolkits import mol_toolkit
from chemper.graphs.fragment_graph import ChemPerGraphFromMol

Simple SMIRKS

The most simple SMIRKS patterns have only the indexed atoms.

ChemPerGraphFromMol objects are initiated with a chemper molecule and a dictioanry storing atom indices by desired SMIRKS index:

[2]:
mol = mol_toolkit.MolFromSmiles('C1CCC1')
# store atom 1 in smirks index 1 and atom 4 in smirks index 2
smirks_atoms = (0, 4)
graph = ChemPerGraphFromMol(mol, smirks_atoms)
graph.as_smirks()
[2]:
'[#6AH2X4x2r4+0:1]-!@[#1AH0X1x0r0+0:2]'

Extend away from the indexed atoms

You can also extend away from the indexed atoms using the optional layers argument. If layers is greater than 0 then atoms up to that many bonds away from the indexed atoms are added to the graph.

Lets start with just 1 layer away

[3]:
graph = ChemPerGraphFromMol(mol, smirks_atoms, layers=1)
graph.as_smirks()
[3]:
'[#6AH2X4x2r4+0:1](-!@[#1AH0X1x0r0+0:2])(-!@[#1AH0X1x0r0+0])(-@[#6AH2X4x2r4+0])-@[#6AH2X4x2r4+0]'

Encode the whole molecule

The other option for layers is ‘all’ which will continue adding atoms until there are no more atoms in the molecule. These SMIRKS become really unreadable for humans, but do encode all information about the molecule.

[4]:
graph = ChemPerGraphFromMol(mol, smirks_atoms, layers='all')
graph.as_smirks()
[4]:
'[#6AH2X4x2r4+0:1](-!@[#1AH0X1x0r0+0:2])(-!@[#1AH0X1x0r0+0])-@[#6AH2X4x2r4+0](-!@[#1AH0X1x0r0+0])(-!@[#1AH0X1x0r0+0])-@[#6AH2X4x2r4+0](-!@[#1AH0X1x0r0+0])(-@[#6AH2X4x2r4+0](-!@[#1AH0X1x0r0+0])-!@[#1AH0X1x0r0+0])-!@[#1AH0X1x0r0+0]'
[ ]: