ChemPer is open source. If you have suggestions or concerns
please add them to our issue tracker.
Documentation for contributing to this work will be available soon.
Below is an outline of tools provided in
ChemPer. See examples
for more details on how to use these tools.
As noted in installation, we seek to
chemper independent of the cheminformatics toolkit.
mol_toolkits is created to keep all code dependent
on the toolkit installed. It can create molecules from
an RDK or OE molecule object or from a SMILES string.
It includes a variety of functions for extracting information
about atoms, bonds, and molecules.
Also included here are SMIRKS pattern searches.
The goal of chemperGraph was to create an example of how you could create a SMIRKS pattern from a molecule and set of atom indices. Creating SMIRKS for one molecule may not be useful for sampling chemical perception in the long run. However, it is a tool that did not previously exist to the best of our knowledge. For a detailed example, see single_mol_smirks.
Here is a brief usage example for using
SingleGraph to create a SMIRKS pattern.
In this case, we want to create a pattern for the carbon-carbon bond in ethene.
The carbon atoms have the indices 0 and 1 in the molecule, specified using the tuple (0,1).
These atoms are assigned to SMIRKS indices :1 and :2 respectively.
In this example, we also include atoms up to one bond away from the indexed atoms by specifying
layers be set to 1.
from chemper.mol_toolkits.mol_toolkit import Mol from chemper.graphs.single_graph import SingleGraph # make molecule from SMILES smiles = 'CCO' mol = Mol.from_smiles(smiles) tagged = (0,1) # atom in carbon-carbon bond # try multiple options for layers for layers in [0, 1, 'all']: # make graph and extract SMIRKS graph = SingleGraph(mol, tagged, layers) print(graph.as_smirks()) # complex SMIRKS with all decorators are the default print(graph.as_smirks(compress=True)) # compressed SMIRKS have only atomic numbers
The goal of ClusterGraph is to store all information about the atoms and bonds that could be in a SMIRKS pattern. These are created assuming you already have a clustered set of molecular fragments. Our primary goal is to determine chemical perception for force field parameterization. We imagine parameters for each molecule (for example bond lengths and force constants) could be clustered by fragment. Then we could generate a hierarchical list of SMIRKS patterns that maintain those clusters for typing purposes. However, you could imagine other reasons for wanting to store how you clustered groups of atoms – for example, using atom or bond types in a machine learning model.
For more detailed examples and illustration of how this works see smirks_from_molecules example. Below is a brief example showing the SMIRKS for the bond between two carbon atoms in propane and pentane.
from chemper.mol_toolkits.mol_toolkit import Mol from chemper.graphs.cluster_graph import ClusterGraph # make molecules from smiles mols = [ Mol.from_smiles('CCO'), Mol.from_smiles('CC=C') ] # identify atoms for tagging # one set of atoms in second molecule tagged = [[ (0,1) ], # one set of atoms in first molecule [ (0,1) ] # one set of atoms in second molecule ] # try multiple options for layers for layers in [0,1,'all']: # make graph graph = ClusterGraph(mols, tagged, layers) print(graph.as_smirks()) # complex is the default output print(graph.as_smirks(compress=True)) # and's common decorators to the end of each atom
The idea with ClusterGraph objects is that they store all
possible decorator information for each atom. In this case the
SMIRKS indexed atoms for propane (mol1) are one of the terminal
and the middle carbons. In pentane (mol2) however the first atom can
be a terminal or middle of the chain carbon atom. This changes
the number of hydrogen atoms (
Hn decorator) on the carbon,
thus there are two possible SMIRKS patterns for atom
#6AH2X4x0r0+0 or (indicated by the “,”)
#6AH3X4x0r0+0. But, atom :2 only has one possibility #6AH2X4x0r0+0.
Let’s assume you have a few clusters of fragments that you want assigned the same force field parameter. For example, you could have clusters of carbon-carbon bonds based on the type of bond between them (single, double, etc). In this case ChemPer would use the SMIRKSifier to generate a hierarchical list of SMIRKS patterns for those clusters. This process creates SMIRKS using ClusterGraph and then takes a stochastic approach to removing unnecessary decorators. See the general_smirks_for_clusters example for how this process could be applied to different bonding parameters.
from chemper.mol_toolkits.mol_toolkit import Mol from chemper.smirksify import SMIRKSifier, print_smirks # make molecules from smiles mols = [ Mol.from_smiles('CCO'), Mol.from_smiles('CC=C') ] # make clusters for each of the 6 bond types: # carbon-carbon single bond 1 ethanol, 1 propene cc_single = ('cc_single', [ [(0,1)], [ (0,1) ] ] ) # carbon-carbon double bond 0 ethanol, 1 propene cc_double = ('cc_double', [ [ ], [ (1,2) ] ] ) # carbon-oxygen bond 1 ethanol, 0 propene co = ('co', [ [ (1,2) ], [ ] ] ) # hydrogen-tetrahedral carbon 5 ethanol, 3 propene hc_tet = ('hc_tet', [ [ (0,3), (0,4), (0,5), (1,6), (1,7) ], [ (0,3), (0,4), (0,5)] ] ) # hydrogen-planar carbon bond 0 ethanol, 3 propene hc_plan = ('hc_plan', [ [ ], [ (1,6), (2,7), (2,8)] ]) # hydrogen-oxygen bond 1 ethanol, 0 propene ho = ('ho', [ [ (2,8) ], [ ] ] ) # initiate SMIRKSifier with default max_layers = 5 and verbose = True fier = SMIRKSifier(mols, [cc_single, cc_double, co, hc_tet, hc_plan, ho]) # print initial SMIRKS print_smirks(fier.current_smirks) # Reduce SMIRKS with default 1000 iterations fier.reduce() # print final SMIRKS print_smirks(fier.current_smirks)