ChemPer API

ChemPer is open source. If you have suggestions or concerns please add them to our issue tracker. Documentation for contributing to this work will be available soon. Below is an outline of tools provided in ChemPer. See examples for more details on how to use these tools.


As noted in installation, we seek to keep chemper independent of the cheminformatics toolkit. mol_toolkits is created to keep all code dependent on the toolkit installed. It can create molecules from an RDK or OE molecule object or from a SMILES string. It includes a variety of functions for extracting information about atoms, bonds, and molecules. Also included here are SMIRKS pattern searches.


The goal of chemperGraph was to create an example of how you could create a SMIRKS pattern from a molecule and set of atom indices. Creating SMIRKS for one molecule may not be useful for sampling chemical perception in the long run. However, it is a tool that did not previously exist to the best of our knowledge. For a detailed example, see single_mol_smirks.

Here is a brief usage example for using SingleGraph to create a SMIRKS pattern. In this case, we want to create a pattern for the carbon-carbon bond in ethene. The carbon atoms have the indices 0 and 1 in the molecule, specified using the tuple (0,1). These atoms are assigned to SMIRKS indices :1 and :2 respectively. In this example, we also include atoms up to one bond away from the indexed atoms by specifying the variable layers be set to 1.

from chemper.mol_toolkits.mol_toolkit import Mol
from chemper.graphs.single_graph import SingleGraph

# make molecule from SMILES
smiles = 'CCO'
mol = Mol.from_smiles(smiles)

tagged = (0,1)   # atom in carbon-carbon bond
# try multiple options for layers
for layers in [0, 1, 'all']:
    # make graph and extract SMIRKS
    graph = SingleGraph(mol, tagged, layers)
    print(graph.as_smirks())   # complex SMIRKS with all decorators are the default
    print(graph.as_smirks(compress=True))   # compressed SMIRKS have only atomic numbers


The goal of ClusterGraph is to store all information about the atoms and bonds that could be in a SMIRKS pattern. These are created assuming you already have a clustered set of molecular fragments. Our primary goal is to determine chemical perception for force field parameterization. We imagine parameters for each molecule (for example bond lengths and force constants) could be clustered by fragment. Then we could generate a hierarchical list of SMIRKS patterns that maintain those clusters for typing purposes. However, you could imagine other reasons for wanting to store how you clustered groups of atoms – for example, using atom or bond types in a machine learning model.

For more detailed examples and illustration of how this works see smirks_from_molecules example. Below is a brief example showing the SMIRKS for the bond between two carbon atoms in propane and pentane.

from chemper.mol_toolkits.mol_toolkit import Mol
from chemper.graphs.cluster_graph import ClusterGraph

# make molecules from smiles
mols = [
# identify atoms for tagging # one set of atoms in second molecule
tagged = [[ (0,1) ],  # one set of atoms in first molecule
          [ (0,1) ]   # one set of atoms in second molecule
# try multiple options for layers
for layers in [0,1,'all']:
    # make graph
    graph = ClusterGraph(mols, tagged, layers)
    print(graph.as_smirks())   # complex is the default output
    print(graph.as_smirks(compress=True))   # and's common decorators to the end of each atom

The idea with ClusterGraph objects is that they store all possible decorator information for each atom. In this case the SMIRKS indexed atoms for propane (mol1) are one of the terminal and the middle carbons. In pentane (mol2) however the first atom can be a terminal or middle of the chain carbon atom. This changes the number of hydrogen atoms (Hn decorator) on the carbon, thus there are two possible SMIRKS patterns for atom :1 #6AH2X4x0r0+0 or (indicated by the “,”) #6AH3X4x0r0+0. But, atom :2 only has one possibility #6AH2X4x0r0+0.


Let’s assume you have a few clusters of fragments that you want assigned the same force field parameter. For example, you could have clusters of carbon-carbon bonds based on the type of bond between them (single, double, etc). In this case ChemPer would use the SMIRKSifier to generate a hierarchical list of SMIRKS patterns for those clusters. This process creates SMIRKS using ClusterGraph and then takes a stochastic approach to removing unnecessary decorators. See the general_smirks_for_clusters example for how this process could be applied to different bonding parameters.

from chemper.mol_toolkits.mol_toolkit import Mol
from chemper.smirksify import SMIRKSifier, print_smirks

# make molecules from smiles
mols = [
# make clusters for each of the 6 bond types:
# carbon-carbon single bond 1 ethanol, 1 propene
cc_single = ('cc_single', [ [(0,1)],  [ (0,1) ] ]  )
# carbon-carbon double bond 0 ethanol, 1 propene
cc_double = ('cc_double', [ [ ], [ (1,2) ] ]  )
# carbon-oxygen bond 1 ethanol, 0 propene
co = ('co', [ [ (1,2) ], [ ] ]  )
# hydrogen-tetrahedral carbon 5 ethanol, 3 propene
hc_tet = ('hc_tet', [ [ (0,3), (0,4), (0,5), (1,6), (1,7) ],  [ (0,3), (0,4), (0,5)] ]  )
# hydrogen-planar carbon bond 0 ethanol, 3 propene
hc_plan = ('hc_plan', [ [ ], [ (1,6), (2,7), (2,8)] ])
# hydrogen-oxygen bond 1 ethanol, 0 propene
ho = ('ho', [ [ (2,8) ],  [ ] ] )

# initiate SMIRKSifier with default max_layers = 5 and verbose = True
fier = SMIRKSifier(mols, [cc_single, cc_double, co, hc_tet, hc_plan, ho])
# print initial SMIRKS
# Reduce SMIRKS with default 1000 iterations
# print final SMIRKS