Cluster Graphs¶
ClusterGraph
is an expansion of the initial SingleGraph
.
They can store information about multiple atoms and bonds simultaneously.
This is a starting example for how to use chemper
’s ClusterGraph
class
to create SMIRKS patterns from clusters of molecular graphs.
cluster_graph.py
ClusterGraph are a class for tracking all possible smirks decorators in a group (or cluster) of molecular fragments. Moving forward these will be used to find the minimum number of smirks decorators that are required to have a set of smirks patterns that maintain a given clustering of fragments.
- class chemper.graphs.cluster_graph.ClusterGraph(mols=None, smirks_atoms_lists=None, layers=0)[source]¶
ChemPerGraphs are a graph based class for storing atom and bond information. They use the chemper.mol_toolkits Atoms, Bonds, and Mols
- class AtomStorage(atoms=None, label=None)[source]¶
AtomStorage tracks information about an atom
- add_atom(atom)[source]¶
Expand current AtomStorage by adding information about a new ChemPer Atom
- Parameters
atom (ChemPer Atom) –
- as_smirks(compress=False)[source]¶
- Parameters
compress (boolean) – should decorators common to all sets be combined for example ‘#6X4,#7X3;+0!r…’
- Returns
smirks – how this atom would be represented in a SMIRKS string with the minimal combination of SMIRKS decorators
- Return type
str
- compare_atom(atom)[source]¶
Compares decorators in this AtomStorage with the provided ChemPer atom. The decorators are compared separately and the highest score is returned. For example, if this storage had two sets of decorators
#7H1X3x0!r+0A
#6H1X4x0!r+0A
- and the input atom would have the decorators:
#6H1X3x2!r+0a
The score is calculated by finding the number of decorators in common which would be
- #7H1X3x0!r+0A and #6H1X3x2r6+0a
have 3 decorators in common (H1,X3,+0)
- #6H1X4x0!r+0A and #6H1X3x2r6+0a
also have 3 decorators in common (#6, H1, +0)
However, we weight atoms with the same atomic number as more similar by dividing the score by 10 if the atomic numbers do not agree. Therefore the final scores will be:
0.3 for #7H1X3x0!r+0A
3 for #6H1X4x0!r+0A
The highest score for any set of decorators is returned so 3 is the returned score in this example.
- Parameters
atom (ChemPer Atom) –
- Returns
score – A score describing how similar the input atom is to any set of decorators currently in this storage, based on its SMIRKS decorators. This score ranges from 0 to 7. 7 comes from the number of decorators on any atom, if this atom matches perfectly with one of the current decorator sets then 7 decorators agree.However, if the atomic number doesn’t agree, then that set of decorators is considered less ideal, thus if the atomic numbers don’t agree, then the score is given by the number other decorators divided by 10. If the current storage is empty, then the score is given as 7 since any atom matches a wildcard atom.
- Return type
float
- class BondStorage(bonds=None, label=None)[source]¶
BondStorage tracks information about a bond
- add_bond(bond)[source]¶
Expand current BondStorage by adding information about a new ChemPer Bond
- Parameters
bond (ChemPer Bond) –
- as_smirks()[source]¶
- Returns
smirks – how this bond would be represented in a SMIRKS string using only the required number of
- Return type
str
- compare_bond(bond)[source]¶
- Parameters
bond (ChemPer Bond) – bond you want to compare to the current storage
- Returns
score – A score describing how similar the input bond is to any set of decorators currently in this storage, based on its SMIRKS decorators.
1 for the bond order + 1 base on if this is a ring bond
- Return type
int (0,1,2)
- add_mol(input_mol, smirks_atoms_list)[source]¶
Expand the information in this graph by adding a new molecule
- Parameters
input_mol (ChemPer Mol) –
smirks_atoms_list (list of tuples) – This is a list of tuples with atom indices [ (indices), … ]
- as_smirks(compress=False)[source]¶
- Parameters
compress (boolean) – returns the shorter version of atom SMIRKS patterns that is atoms have decorators “anded” to the end rather than listed in each set that are OR’d together. For example “[#6AH2X3x0!r+0,#6AH1X3x0!r+0:1]-;!@[#1AH0X1x0!r+0]” compresses to: “[#6H2,#6H1;AX3x0!r+0:1]-;!@[#1AH0X1x0!r+0]”
- Returns
SMIRKS – a SMIRKS string matching the exact atom and bond information stored
- Return type
str
- find_pairs(atoms_and_bonds, storages)[source]¶
Find pairs is used to determine which current AtomStorage from storages atoms should be paired with. This function takes advantage of the maximum scoring function in networkx to find the pairing with the highest “score”. Scores are determined using functions in the atom and bond storage objects that compare those storages to the new atom or bond.
If there are less atoms than storages then the atoms with the lowest pair are assigned a None pairing.
- Parameters
atoms_and_bonds (list of tuples in form (ChemPer Atom, ChemPer Bond, ...)) –
storages (list of tuples in form (AtomStorage, BondStorage, ...)) –
same (Tuples can be of any length as long as they are the) –
example (so for) –
in –
compare (so in that case you would) –
(atom1 –
(atom2 () and) –
(atom_storage1 () with) –
(atom_storage2 () and) –
) –
However –
torsion (in a) –
bond (you might want the atoms and bonds for each outer) –
compare –
(atom1 –
bond1 –
(atom4 (atom2) and) –
bond3 –
atom3) –
objects. (with the corresponding storage) –
- Returns
pairs – pairs of atoms and storage objects that are most similar, these lists always come in the form (all atom/bonds, all storage objects) for the bond example above you might get [ [atom1, storage1], [atom2, storage2] ] for the torsion example you might get [ [atom4, bond4, atom3, atom_storage1, bond_storage1, atom_storage2],
[atom1, bond1, atom2, atom_storage4, bond_storage3, atom_storage3]
- Return type
list of lists
- get_symmetry_funct(sym_label)[source]¶
Determine the symmetry function that should be used when adding atoms to this graph.
For example, imagine a user is trying to make a SMIRKS for all of the C-H bonds in methane. In most toolkits the index for the carbon is 0 and the hydrogens are 1,2,3,4. The final SMIRKS should have the form [#6AH4X4x0!r+0:1]-;!@[#1AH0X1x0!r+0] no matter what order the atoms are input into ClusterGraph. So if the user provides (0,1), (0,2), (3,0), (4,0) ClusterGraph should figure out that the carbons in (3,0) and (4,0) should be in the atom index :1 place like they were in the first set of atoms.
Bond atoms in (1,2) or (2,1) are symmetric, for angles its (1,2,3) or (3,2,1) for proper torsions (1,2,3,4) or (4,3,2,1) and for improper torsions (1,2,3,4), (3,2,1,4), (4,2,1,3). For any other fragment type the atoms will be added to the graph in the order they are provided since the symmetry function is unknown.
# TODO: In theory you could generalize this for generic linear fragments # where those with an odd number of atoms behave like angles and an # even number behave like proper torsions, however I think that is # going to be outside the scope of ChemPer for the foreseeable future.
- Parameters
sym_label (str or None) – type of symmetry, options which will change the way symmetry is handled in the graph are “bond”, “angle”, “ProperTorsion”, and “ImproperTorsion”
- Returns
symmetry_funct – returns the function that should be used to handle the appropriate symmetry
- Return type
function