SMIRKSifier¶
SMIRKSifier
is a first attempt
at ChemPer’s ultimate goal of creating hierarchical
lists of SMIRKS patterns.
When provided with clusters of molecular fragments,
the SMIRKSifier
generates an ordered list of SMIRKS
patterns.
smirksify.py
In this script, we start with a set of clustered molecular fragments with specified indexed atoms as those you would use to build a ClusterGraph. We then build cluster Graphs to create the initial SMIRKS patterns and check that the generated SMIRKS patterns retain the typing from the input cluster. Next we run a series of iterations removing SMIRKS decorators. If this “move” doesn’t change the way the molecules are typed then the change is accepted.
This class takes inspiration from the tool SMIRKY previously published by the Open Force Field Initiative: github.com/openforcefield/smarty
In theory, it is possible this process of removing decorators could be more systematic/deterministic, however this is a first approach to see if extracted SMIRKS patterns can do better than SMIRKY. Also, this approach will be more general since the input clusters do not rely on a reference force field.
- exception chemper.smirksify.ClusteringError(msg)[source]¶
Exception for when the SMIRKSifier is unable to create a list of SMIRKS to maintain the input clusters.
- class chemper.smirksify.Reducer(smirks_list, mols, verbose=False)[source]¶
Reducer starts with any list of SMIRKS and removes unnecessary decorators while maintaining typing on input molecules. This was created to be used as a part of the SMIRKSifier.reduce function. However, if you have complex SMIRKS and a list of molecules you can also reduce those patterns independently.
- current_smirks¶
current SMIRKS patterns in the form (label, smirks)
- Type
list of tuples
- mols¶
molecules being used to reduce the input SMIRKS
- Type
list of chemper molecules
- cluster_dict¶
Dictionary specifying typing using current SMIRKS in the form: {mol_idx:
{ (tuple of atom indices): label } }
- Type
dictionary
- remove_all_bases(input_ors)[source]¶
convert all bases to [*] i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(, [X4, +0]), (, [X3]) ]
- Parameters
input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]
- Returns
new_ors – New OR decorators
- Return type
list of two tuples
- remove_all_dec_type(input_ors)[source]¶
remove all decorators of the same type, like all ‘X’ decorators i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(#6, [+0]), (#7, []) ]
- Parameters
input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]
- Returns
new_ors – New OR decorators
- Return type
list of two tuples
- remove_and(input_all_ands)[source]¶
removes a decorator that is AND’d in the original SMIRKS
- Parameters
input_all_ands (list) – List of AND decorators
- Returns
new_ands – List of new AND decorators
- Return type
list
- remove_decorator(smirks)[source]¶
Chose an atom or bond in the input smirks pattern and then remove one decorator from it.
- Parameters
smirks (str) – A SMIRKS string which should be reduced
- Returns
new_smirks (str) – A new SMIRKS pattern
is_changed (bool) – True if some of the decorators were successfully removed
- remove_one_sub_dec(input_ors, ref_idx)[source]¶
Remove one OR decorator from the specified index # i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(#6, [+0]), (#7, [X3]) ]
- Parameters
input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]
ref_idx (int) – The index from this list to use when removing one sub-decorator
- Returns
new_ors – New OR decorators
- Return type
list of two tuples
- remove_or(input_all_ors, is_bond=False)[source]¶
Changes the OR decorators by removing some of them
- Parameters
input_all_ors (list of tuples) – these are the OR decorators for an atom or bond from a ChemicalEnvironment
is_bond (boolean) – are these decorators from from a bond (False for atom)
- Returns
new_ors – new OR decorators
- Return type
list of two tuples
- remove_or_atom(input_all_ors, or_idx)[source]¶
makes specific types of changes based on atom OR decorators
- Parameters
input_all_ors (list of OR decorators) – [ (base, [decs]), …]
or_idx (index that should be used to guide changes) –
- Returns
new_ors – new or decorators
- Return type
list of two tuples
- remove_ref(input_ors, ref_idx)[source]¶
Remove the decorator at the referenced index i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(#7, [X3])]
- Parameters
input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]
ref_idx (int) – The OR decorators at ref_idx will be removed entirely
- Returns
new_ors – New OR decorators
- Return type
list of two tuples
- remove_ref_sub_decs(input_ors, ref_idx)[source]¶
Remove all of the ORdecorators at the specified index i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(#6, []), (#7, [X3]) ]
- Parameters
input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]
ref_idx (int) – The index from this list to use when removing one set of sub-decorators
- Returns
new_ors – New OR decorators
- Return type
list of two tuples
- run(max_its=1000, verbose=None)[source]¶
Reduce the SMIRKS decorators for a number of iterations
- Parameters
max_its (int) – The specified number of iterations
verbose (boolean) – will set the verboseness while running (if None, the current verbose variable will be used)
- Returns
final_smirks – list of final smirks patterns after reducing in the form [(label, smirks)]
- Return type
list of tuples
- class chemper.smirksify.SMIRKSifier(molecules, cluster_list, max_layers=5, verbose=True, strict_smirks=True)[source]¶
Generates complex SMIRKS for a given cluster of substructures and then reduces the decorators in those smirks
- make_smirks()[source]¶
Create a list of SMIRKS patterns for the input clusters. This includes a determining how far away from the indexed atom should be included in the SMIRKS (or the number of max_layers is reached)
- Returns
smirks_list (list of tuples) – list of tuples in the form (label, smirks)
layers (int) – number of layers actually used to specify the set clusters
- reduce(max_its=1000, verbose=None)[source]¶
Reduce the SMIRKS decorators for a number of iterations
- Parameters
max_its (int) – default = 1000 The specified number of iterations
verbose (boolean) – default = None will set the verboseness while running (if None, the current verbose variable will be used)
- Returns
final_smirks – list of final smirks patterns after reducing in the form [(label, smirks)]
- Return type
list of tuples
- types_match_reference(current_types=None)[source]¶
Determine best match for each parameter with reference types
- Parameters
current_types (list of tuples with form [ (label, smirks), ]) –
- Returns
type_matches – pair of current and reference labels with the number of fragments that match it
- Return type
list of tuples (current_label, reference_label, counts)