Make SMIRKS from clustered fragments¶
ChemPer
’s ClusterGraph
creates SMIRKS patterns from a group of user specified molecular fragments.ClusterGraph
collects the SMIRKS decorators from every molecule and stores them in a highlyspecific SMIRKS pattern.The ultimate goal for chemper is to create a hierarchical list of SMIRKS patterns that retains fragment clustering. We could use this tool to generate SMIRKS patterns for the SMIRNOFF force field format allowing use to create data driven, direct chemical percpeption.
For example, if your initial clusters had 4 types of carbon-carbon bonds (single, aromatic, double, and triple), you would expect the final SMIRKS patterns to reflect those four categories.
The first step here is to store possible decorators for atoms and bonds in a given cluster. In this notebook we will use example SMIRKS patterns as a way of identifying groups of molecular fragments. Then we will use ClusterGraph
to create highly specific SMIRKS for these same fragments.
[1]:
# import statements
from chemper.mol_toolkits import mol_toolkit
from chemper.graphs.cluster_graph import ClusterGraph
from chemper.chemper_utils import create_tuples_for_clusters
create_tuples_for_clusters¶
This is a utility function inside ChemPer which extracts atom indices which match a specific SMIRKS pattern.
Help on function create_tuples_for_clusters in module chemper.chemper_utils: For example, lets assume you wanted to find all of the atoms that match this SMIRKS list * “any”, '[*:1]~[*:2]'
* “single”, '[*:1]-[*:2]'
In this case, the “any” bond would match all bonds, but then the “single” would match all single bonds. If you were typing Ethene (C=C) then you expect the double bond between carbon atoms 0 and 1 to match “any” bond and all C-H bonds to match “single”.
The output in this case would be:
[ ('any', [[ (0, 1) ]] ),
('single', [[ (0, 2), (0, 3), (1,4), (1,5) ]] )
]
Clustering from other SMIRKS¶
This example attempts to show how ClusterGraph
creates a SMIRKS for already clustered sub-graphs.
Here, we will consider two types of angles around tetrahedral carbon atoms. In this hierarchical list c1
would match ANY angle around a tetrahedral carbon (indicated with the connectivity X4
on atom :2
). Then c2
would match angles where both outer atoms are hydrogens, just H-C-H angles, meaning those angles would be assigned c2
and NOT c1
.
We will use the utility function create_tuples_for_clusters
(described above) to identify atoms in each example moleucle that match each of these angle types.
[2]:
smirks_list = [
("c1", "[*:1]~[#6X4:2]-[*:3]"),
("c2", "[#1:1]-[#6X4:2]-[#1:3]"),
]
for label, smirks in smirks_list:
print(label,'\t',smirks)
c1 [*:1]~[#6X4:2]-[*:3]
c2 [#1:1]-[#6X4:2]-[#1:3]
Start with a single molecule¶
For the first example, we will start with just one molecule (ethane) and extract the clusters of atoms matching each angle type.
Ethane has a total of 12 sets of angles, all of which can be categorized by the two SMIRKS patterns c1
or c2
* 6 with the form H-C-C - type c1 * 6 with the form H-C-H - type c2
First we need to extract the atoms for each of these categories. We use tuples of atom indices to represent these two clusters which are identified using the create_tuple_for_cluster
utilities function.
[3]:
mol = mol_toolkit.MolFromSmiles('CC')
atom_index_list = create_tuples_for_clusters(smirks_list, [mol])
for label, mol_list in atom_index_list:
print(label)
for mol_idx, atom_list in enumerate(mol_list):
print('\tmolecule ', mol_idx)
for atoms in atom_list:
print('\t\t', atoms)
c1
molecule 0
(1, 0, 3)
(0, 1, 7)
(0, 1, 6)
(1, 0, 4)
(1, 0, 2)
(0, 1, 5)
c2
molecule 0
(5, 1, 7)
(5, 1, 6)
(6, 1, 7)
(3, 0, 4)
(2, 0, 4)
(2, 0, 3)
Next, we will look at the ClusterGraph
for the set of atoms matching the angle type c1
([*:1]~[#6X4:2]-[*:3]
). ClusterGraph
works by only storing the unique combination of atom decorators. That means that even though we are using six sets of atoms there is only one set of decorators for each atom in the SMIRKS patterns
[6]:
c1_atoms = atom_index_list[0][1]
graph = ClusterGraph([mol], c1_atoms)
print(graph.as_smirks())
[#6AH3X4x0!r+0:1]-;!@[#6AH3X4x0!r+0:2]-;!@[#1AH0X1x0!r+0:3]
Adding Layers¶
Similar to the ChemPerGraph
s described in the single_mol_smirks
example. We can add atoms outside those indexed in ClusterGraph
. This is done with the key word layers
. The specified number of layers corresponds to the number of bonds away from an indexed atom should be included in the SMIRKS. As with ChemPerGraph
s, you can also use the keyword "all"
to include all atoms in a molecule in the SMIRKS pattern. For ethane, this would result in the same SMIRKS as specifying
1 layer:
[7]:
print("layers = 0")
graph = ClusterGraph([mol], c1_atoms, layers=1)
print(graph.as_smirks())
print('-'*80)
print("layers='all'")
graph = ClusterGraph([mol], c1_atoms, layers='all')
print(graph.as_smirks())
layers = 0
[#6AH3X4x0!r+0:1](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#6AH3X4x0!r+0:2](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0:3]
--------------------------------------------------------------------------------
layers='all'
[#6AH3X4x0!r+0:1](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#6AH3X4x0!r+0:2](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0:3]
Multiple molecules¶
Now that you have the general idea, lets consider a more complex case, Lets create a ClusterGraph
for both labels in the smirks_list
from above for the hydrocarbons listed below.
First we need to create the molecules and use create_tuple_for_cluster
to find group the angles by category.
[8]:
smiles = ['CC', 'CCC', 'C1CC1', 'CCCC', 'CC(C)C', 'C1CCC1', 'CCCCC']
mols = [mol_toolkit.MolFromSmiles(s) for s in smiles]
atom_index_list = create_tuples_for_clusters(smirks_list, mols)
for label, mol_list in atom_index_list:
print(label)
for mol_idx, atom_list in enumerate(mol_list):
print('\tmolecule ', mol_idx)
for atoms in atom_list:
print('\t\t', atoms)
c1
molecule 0
(1, 0, 3)
(0, 1, 7)
(0, 1, 6)
(1, 0, 4)
(1, 0, 2)
(0, 1, 5)
molecule 1
(1, 0, 3)
(1, 2, 8)
(1, 0, 5)
(1, 0, 4)
(2, 1, 7)
(0, 1, 2)
(1, 2, 9)
(0, 1, 7)
(1, 2, 10)
(0, 1, 6)
(2, 1, 6)
molecule 2
(2, 0, 4)
(1, 2, 8)
(2, 0, 3)
(1, 0, 3)
(0, 2, 8)
(1, 2, 7)
(1, 0, 2)
(2, 1, 5)
(0, 2, 7)
(0, 1, 2)
(0, 1, 6)
(0, 2, 1)
(2, 1, 6)
(0, 1, 5)
(1, 0, 4)
molecule 3
(2, 1, 7)
(0, 1, 8)
(0, 1, 7)
(0, 1, 2)
(1, 2, 9)
(2, 3, 12)
(1, 2, 3)
(1, 2, 10)
(1, 0, 6)
(1, 0, 4)
(3, 2, 10)
(1, 0, 5)
(2, 1, 8)
(2, 3, 11)
(2, 3, 13)
(3, 2, 9)
molecule 4
(2, 1, 7)
(1, 2, 8)
(0, 1, 7)
(0, 1, 2)
(1, 2, 9)
(0, 1, 3)
(1, 2, 10)
(1, 0, 6)
(3, 1, 7)
(2, 1, 3)
(1, 0, 4)
(1, 3, 13)
(1, 0, 5)
(1, 3, 12)
(1, 3, 11)
molecule 5
(1, 0, 3)
(1, 2, 8)
(0, 1, 7)
(2, 1, 7)
(0, 1, 2)
(1, 2, 9)
(3, 0, 4)
(1, 2, 3)
(3, 0, 5)
(1, 0, 4)
(2, 1, 6)
(0, 1, 6)
(1, 0, 5)
(2, 3, 10)
(2, 3, 11)
(0, 3, 11)
(3, 2, 8)
(0, 3, 2)
(0, 3, 10)
(3, 2, 9)
molecule 6
(0, 1, 8)
(0, 1, 2)
(0, 1, 9)
(2, 3, 12)
(1, 2, 3)
(1, 2, 10)
(4, 3, 13)
(1, 0, 6)
(1, 2, 11)
(1, 0, 7)
(3, 4, 16)
(3, 2, 10)
(1, 0, 5)
(2, 1, 8)
(3, 2, 11)
(2, 1, 9)
(2, 3, 13)
(3, 4, 14)
(2, 3, 4)
(4, 3, 12)
(3, 4, 15)
c2
molecule 0
(5, 1, 7)
(5, 1, 6)
(6, 1, 7)
(3, 0, 4)
(2, 0, 4)
(2, 0, 3)
molecule 1
(8, 2, 9)
(6, 1, 7)
(3, 0, 4)
(3, 0, 5)
(9, 2, 10)
(4, 0, 5)
(8, 2, 10)
molecule 2
(5, 1, 6)
(3, 0, 4)
(7, 2, 8)
molecule 3
(11, 3, 13)
(11, 3, 12)
(9, 2, 10)
(7, 1, 8)
(5, 0, 6)
(4, 0, 6)
(12, 3, 13)
(4, 0, 5)
molecule 4
(11, 3, 13)
(11, 3, 12)
(9, 2, 10)
(12, 3, 13)
(8, 2, 9)
(5, 0, 6)
(4, 0, 6)
(4, 0, 5)
(8, 2, 10)
molecule 5
(6, 1, 7)
(8, 2, 9)
(10, 3, 11)
(4, 0, 5)
molecule 6
(8, 1, 9)
(12, 3, 13)
(14, 4, 15)
(5, 0, 6)
(15, 4, 16)
(6, 0, 7)
(14, 4, 16)
(5, 0, 7)
(10, 2, 11)
Now lets make a ClusterGraph
object for both c1
and c2
. In these patterns you will see lists of decorators on each atom. In the SMIRKS lanage ','
stands for ‘OR’. So in the case of "[#6AH1X4x0!r+0,#6AH2X4x0!r+0:1]"
both decorator sets ("#6AH1X4x0!r+0"
or "#6AH2X4x0!r+0"
) could match up with atom :1
[9]:
c1_graph = ClusterGraph(mols, atom_index_list[0][1])
print('c1\n'+'-'*50)
print(c1_graph.as_smirks())
c2_graph = ClusterGraph(mols, atom_index_list[1][1])
print()
print('c2\n'+'-'*50)
print(c2_graph.as_smirks())
c1
--------------------------------------------------
[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:1]-[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:2]-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:3]
c2
--------------------------------------------------
[#1AH0X1x0!r+0:1]-;!@[#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:2]-;!@[#1AH0X1x0!r+0:3]
Identifying common decorators¶
You might notice that some SMIRKS decorators in each atom list are very similar. For example, all of our atoms are neutral so they all have the decorator "+0"
to indicate a formal charge of zero.
We can take advantage of these commonalities and group decorators together using the SMIRKS ";"
symbol for ANDing decorators. For example, in "[#6,#7;+0:1]"
the atom is either carbon (#6
) or (,
) nitrogen (#7
) and (;
) it has a zero formal charge (+0
).
In the ChemPer
graph language you can group like decorators using the keyword compress
. In that case we get these SMIRKS patterns for c1
and c2
instead:
[10]:
print('c1\n'+'-'*50)
print(c1_graph.as_smirks(compress=True))
print()
print('c2\n'+'-'*50)
print(c2_graph.as_smirks(compress=True))
c1
--------------------------------------------------
[*!rH1x0,*!rH2x0,*!rH3x0,*H2r3x2,*H2r4x2;#6;+0;A;X4:1]-[*!rH1x0,*!rH2x0,*!rH3x0,*H2r3x2,*H2r4x2;#6;+0;A;X4:2]-[#1!rH0X1x0,#6!rH2X4x0,#6!rH3X4x0,#6H2X4r3x2,#6H2X4r4x2;+0;A:3]
c2
--------------------------------------------------
[#1AH0X1x0!r+0:1]-;!@[*!rH2x0,*!rH3x0,*H2r3x2,*H2r4x2;#6;+0;A;X4:2]-;!@[#1AH0X1x0!r+0:3]
Adding layers¶
As shown above we could also add layers
to the ClusterGraph
s with multiple molecules.
[11]:
for l in [1,2,3]:
print('layers = ', l)
c1_graph = ClusterGraph(mols, atom_index_list[0][1], layers=l)
print('c1\n'+'-'*50)
print(c1_graph.as_smirks())
c2_graph = ClusterGraph(mols, atom_index_list[1][1], layers=l)
print()
print('c2\n'+'-'*50)
print(c2_graph.as_smirks())
print('\n', '='*80, '\n')
layers = 1
c1
--------------------------------------------------
[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:1](-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0])-[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:2](-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0])-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:3](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0]
c2
--------------------------------------------------
[#1AH0X1x0!r+0:1]-;!@[#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:2](-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0])(-[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0])-;!@[#1AH0X1x0!r+0:3]
================================================================================
layers = 2
c1
--------------------------------------------------
[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:1](-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0](-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:2](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:3](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0]
c2
--------------------------------------------------
[#1AH0X1x0!r+0:1]-;!@[#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:2](-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0](-[#1AH0X1x0!r+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0])-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0:3]
================================================================================
layers = 3
c1
--------------------------------------------------
[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:1](-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0](-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:2](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:3](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0]
c2
--------------------------------------------------
[#1AH0X1x0!r+0:1]-;!@[#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:2](-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0](-[#1AH0X1x0!r+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH3X4x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0,#6AH3X4x0!r+0](-;!@[#1AH0X1x0!r+0])(-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0])-;!@[#1AH0X1x0!r+0:3]
================================================================================
Where do you go from here¶
As you see above, the ClusterGraph
SMIRKS are significantly more complicated and specific than the input SMIRKS. For example, the input SMIRKS for c1
is [*:1]~[#6X4:2]-[*:3]
, however ClusterGraph
creates this monstrosity:
[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:1]-[#6AH1X4x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:2]-[#1AH0X1x0!r+0,#6AH2X4x0!r+0,#6AH2X4x2r3+0,#6AH2X4x2r4+0,#6AH3X4x0!r+0:3]
Although this pattern becomes a bit less complex with the compression:
[*!rH1x0,*!rH2x0,*!rH3x0,*H2r3x2,*H2r4x2;#6;+0;A;X4:1]-[*!rH1x0,*!rH2x0,*!rH3x0,*H2r3x2,*H2r4x2;#6;+0;A;X4:2]-[#1!rH0X1x0,#6!rH2X4x0,#6!rH3X4x0,#6H2X4r3x2,#6H2X4r4x2;+0;A:3]
Our goal is to generate a hierarchical list of SMIRKS would could recover the same chemistry in a different list of molecules. In order to do this we would want to generate the SMIRKS patterns for different clusters and then remove unnecessary decorators.
To meet this purpose we created the SMIRKSifier
. For details on this topic see the notebook smirksifying_clusters
in this example folder.
[ ]: