Cluster Graphs

ClusterGraph is an expansion of the initial SingleGraph. They can store information about multiple atoms and bonds simultaneously. This is a starting example for how to use chemper’s ClusterGraph class to create SMIRKS patterns from clusters of molecular graphs.

cluster_graph.py

ClusterGraph are a class for tracking all possible smirks decorators in a group (or cluster) of molecular fragments. Moving forward these will be used to find the minimum number of smirks decorators that are required to have a set of smirks patterns that maintain a given clustering of fragments.

class chemper.graphs.cluster_graph.ClusterGraph(mols=None, smirks_atoms_lists=None, layers=0)[source]

ChemPerGraphs are a graph based class for storing atom and bond information. They use the chemper.mol_toolkits Atoms, Bonds, and Mols

class AtomStorage(atoms=None, label=None)[source]

AtomStorage tracks information about an atom

add_atom(atom)[source]

Expand current AtomStorage by adding information about a new ChemPer Atom

Parameters

atom (ChemPer Atom) –

as_smirks(compress=False)[source]
Parameters

compress (boolean) – should decorators common to all sets be combined for example ‘#6X4,#7X3;+0!r…’

Returns

smirks – how this atom would be represented in a SMIRKS string with the minimal combination of SMIRKS decorators

Return type

str

compare_atom(atom)[source]

Compares decorators in this AtomStorage with the provided ChemPer atom. The decorators are compared separately and the highest score is returned. For example, if this storage had two sets of decorators

  • #7H1X3x0!r+0A

  • #6H1X4x0!r+0A

and the input atom would have the decorators:
  • #6H1X3x2!r+0a

The score is calculated by finding the number of decorators in common which would be

  • #7H1X3x0!r+0A and #6H1X3x2r6+0a

    have 3 decorators in common (H1,X3,+0)

  • #6H1X4x0!r+0A and #6H1X3x2r6+0a

    also have 3 decorators in common (#6, H1, +0)

However, we weight atoms with the same atomic number as more similar by dividing the score by 10 if the atomic numbers do not agree. Therefore the final scores will be:

  • 0.3 for #7H1X3x0!r+0A

  • 3 for #6H1X4x0!r+0A

The highest score for any set of decorators is returned so 3 is the returned score in this example.

Parameters

atom (ChemPer Atom) –

Returns

score – A score describing how similar the input atom is to any set of decorators currently in this storage, based on its SMIRKS decorators. This score ranges from 0 to 7. 7 comes from the number of decorators on any atom, if this atom matches perfectly with one of the current decorator sets then 7 decorators agree.However, if the atomic number doesn’t agree, then that set of decorators is considered less ideal, thus if the atomic numbers don’t agree, then the score is given by the number other decorators divided by 10. If the current storage is empty, then the score is given as 7 since any atom matches a wildcard atom.

Return type

float

make_atom_decorators(atom)[source]

extract information from a ChemPer Atom that would be useful in a smirks

Parameters

atom (ChemPer atom object) –

Returns

decorators – tuple of all possible decorators for this atom

Return type

tuple of str

class BondStorage(bonds=None, label=None)[source]

BondStorage tracks information about a bond

add_bond(bond)[source]

Expand current BondStorage by adding information about a new ChemPer Bond

Parameters

bond (ChemPer Bond) –

as_smirks()[source]
Returns

smirks – how this bond would be represented in a SMIRKS string using only the required number of

Return type

str

compare_bond(bond)[source]
Parameters

bond (ChemPer Bond) – bond you want to compare to the current storage

Returns

score – A score describing how similar the input bond is to any set of decorators currently in this storage, based on its SMIRKS decorators.

1 for the bond order + 1 base on if this is a ring bond

Return type

int (0,1,2)

add_mol(input_mol, smirks_atoms_list)[source]

Expand the information in this graph by adding a new molecule

Parameters
  • input_mol (ChemPer Mol) –

  • smirks_atoms_list (list of tuples) – This is a list of tuples with atom indices [ (indices), … ]

as_smirks(compress=False)[source]
Parameters

compress (boolean) – returns the shorter version of atom SMIRKS patterns that is atoms have decorators “anded” to the end rather than listed in each set that are OR’d together. For example “[#6AH2X3x0!r+0,#6AH1X3x0!r+0:1]-;!@[#1AH0X1x0!r+0]” compresses to: “[#6H2,#6H1;AX3x0!r+0:1]-;!@[#1AH0X1x0!r+0]”

Returns

SMIRKS – a SMIRKS string matching the exact atom and bond information stored

Return type

str

find_pairs(atoms_and_bonds, storages)[source]

Find pairs is used to determine which current AtomStorage from storages atoms should be paired with. This function takes advantage of the maximum scoring function in networkx to find the pairing with the highest “score”. Scores are determined using functions in the atom and bond storage objects that compare those storages to the new atom or bond.

If there are less atoms than storages then the atoms with the lowest pair are assigned a None pairing.

Parameters
  • atoms_and_bonds (list of tuples in form (ChemPer Atom, ChemPer Bond, ...)) –

  • storages (list of tuples in form (AtomStorage, BondStorage, ...)) –

  • same (Tuples can be of any length as long as they are the) –

  • example (so for) –

  • in

  • compare (so in that case you would) –

  • (atom1

  • (atom2 () and) –

  • (atom_storage1 () with) –

  • (atom_storage2 () and) –

  • )

  • However

  • torsion (in a) –

  • bond (you might want the atoms and bonds for each outer) –

  • compare

  • (atom1

  • bond1

  • (atom4 (atom2) and) –

  • bond3

  • atom3)

  • objects. (with the corresponding storage) –

Returns

pairs – pairs of atoms and storage objects that are most similar, these lists always come in the form (all atom/bonds, all storage objects) for the bond example above you might get [ [atom1, storage1], [atom2, storage2] ] for the torsion example you might get [ [atom4, bond4, atom3, atom_storage1, bond_storage1, atom_storage2],

[atom1, bond1, atom2, atom_storage4, bond_storage3, atom_storage3]

Return type

list of lists

get_symmetry_funct(sym_label)[source]

Determine the symmetry function that should be used when adding atoms to this graph.

For example, imagine a user is trying to make a SMIRKS for all of the C-H bonds in methane. In most toolkits the index for the carbon is 0 and the hydrogens are 1,2,3,4. The final SMIRKS should have the form [#6AH4X4x0!r+0:1]-;!@[#1AH0X1x0!r+0] no matter what order the atoms are input into ClusterGraph. So if the user provides (0,1), (0,2), (3,0), (4,0) ClusterGraph should figure out that the carbons in (3,0) and (4,0) should be in the atom index :1 place like they were in the first set of atoms.

Bond atoms in (1,2) or (2,1) are symmetric, for angles its (1,2,3) or (3,2,1) for proper torsions (1,2,3,4) or (4,3,2,1) and for improper torsions (1,2,3,4), (3,2,1,4), (4,2,1,3). For any other fragment type the atoms will be added to the graph in the order they are provided since the symmetry function is unknown.

# TODO: In theory you could generalize this for generic linear fragments # where those with an odd number of atoms behave like angles and an # even number behave like proper torsions, however I think that is # going to be outside the scope of ChemPer for the foreseeable future.

Parameters

sym_label (str or None) – type of symmetry, options which will change the way symmetry is handled in the graph are “bond”, “angle”, “ProperTorsion”, and “ImproperTorsion”

Returns

symmetry_funct – returns the function that should be used to handle the appropriate symmetry

Return type

function