SMIRKSifier

SMIRKSifier is a first attempt at ChemPer’s ultimate goal of creating hierarchical lists of SMIRKS patterns. When provided with clusters of molecular fragments, the SMIRKSifier generates an ordered list of SMIRKS patterns.

smirksify.py

In this script, we start with a set of clustered molecular fragments with specified indexed atoms as those you would use to build a ClusterGraph. We then build cluster Graphs to create the initial SMIRKS patterns and check that the generated SMIRKS patterns retain the typing from the input cluster. Next we run a series of iterations removing SMIRKS decorators. If this “move” doesn’t change the way the molecules are typed then the change is accepted.

This class takes inspiration from the tool SMIRKY previously published by the Open Force Field Initiative: github.com/openforcefield/smarty

In theory, it is possible this process of removing decorators could be more systematic/deterministic, however this is a first approach to see if extracted SMIRKS patterns can do better than SMIRKY. Also, this approach will be more general since the input clusters do not rely on a reference force field.

exception chemper.smirksify.ClusteringError(msg)[source]

Exception for when the SMIRKSifier is unable to create a list of SMIRKS to maintain the input clusters.

class chemper.smirksify.Reducer(smirks_list, mols, verbose=False)[source]

Reducer starts with any list of SMIRKS and removes unnecessary decorators while maintaining typing on input molecules. This was created to be used as a part of the SMIRKSifier.reduce function. However, if you have complex SMIRKS and a list of molecules you can also reduce those patterns independently.

current_smirks

current SMIRKS patterns in the form (label, smirks)

Type

list of tuples

mols

molecules being used to reduce the input SMIRKS

Type

list of chemper molecules

cluster_dict

Dictionary specifying typing using current SMIRKS in the form: {mol_idx:

{ (tuple of atom indices): label } }

Type

dictionary

remove_all_bases(input_ors)[source]

convert all bases to [*] i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(, [X4, +0]), (, [X3]) ]

Parameters

input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]

Returns

new_ors – New OR decorators

Return type

list of two tuples

remove_all_dec_type(input_ors)[source]

remove all decorators of the same type, like all ‘X’ decorators i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(#6, [+0]), (#7, []) ]

Parameters

input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]

Returns

new_ors – New OR decorators

Return type

list of two tuples

remove_and(input_all_ands)[source]

removes a decorator that is AND’d in the original SMIRKS

Parameters

input_all_ands (list) – List of AND decorators

Returns

new_ands – List of new AND decorators

Return type

list

remove_decorator(smirks)[source]

Chose an atom or bond in the input smirks pattern and then remove one decorator from it.

Parameters

smirks (str) – A SMIRKS string which should be reduced

Returns

  • new_smirks (str) – A new SMIRKS pattern

  • is_changed (bool) – True if some of the decorators were successfully removed

remove_one_sub_dec(input_ors, ref_idx)[source]

Remove one OR decorator from the specified index # i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(#6, [+0]), (#7, [X3]) ]

Parameters
  • input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]

  • ref_idx (int) – The index from this list to use when removing one sub-decorator

Returns

new_ors – New OR decorators

Return type

list of two tuples

remove_or(input_all_ors, is_bond=False)[source]

Changes the OR decorators by removing some of them

Parameters
  • input_all_ors (list of tuples) – these are the OR decorators for an atom or bond from a ChemicalEnvironment

  • is_bond (boolean) – are these decorators from from a bond (False for atom)

Returns

new_ors – new OR decorators

Return type

list of two tuples

remove_or_atom(input_all_ors, or_idx)[source]

makes specific types of changes based on atom OR decorators

Parameters
  • input_all_ors (list of OR decorators) – [ (base, [decs]), …]

  • or_idx (index that should be used to guide changes) –

Returns

new_ors – new or decorators

Return type

list of two tuples

remove_ref(input_ors, ref_idx)[source]

Remove the decorator at the referenced index i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(#7, [X3])]

Parameters
  • input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]

  • ref_idx (int) – The OR decorators at ref_idx will be removed entirely

Returns

new_ors – New OR decorators

Return type

list of two tuples

remove_ref_sub_decs(input_ors, ref_idx)[source]

Remove all of the ORdecorators at the specified index i.e. [(#6, [X4, +0]), (#7, [X3]) ] –> [(#6, []), (#7, [X3]) ]

Parameters
  • input_ors (list of two tuples) – OR decorators in the form from ChemicalEnvironments that is [ (base, [decorators, ]), … ]

  • ref_idx (int) – The index from this list to use when removing one set of sub-decorators

Returns

new_ors – New OR decorators

Return type

list of two tuples

run(max_its=1000, verbose=None)[source]

Reduce the SMIRKS decorators for a number of iterations

Parameters
  • max_its (int) – The specified number of iterations

  • verbose (boolean) – will set the verboseness while running (if None, the current verbose variable will be used)

Returns

final_smirks – list of final smirks patterns after reducing in the form [(label, smirks)]

Return type

list of tuples

class chemper.smirksify.SMIRKSifier(molecules, cluster_list, max_layers=5, verbose=True, strict_smirks=True)[source]

Generates complex SMIRKS for a given cluster of substructures and then reduces the decorators in those smirks

make_smirks()[source]

Create a list of SMIRKS patterns for the input clusters. This includes a determining how far away from the indexed atom should be included in the SMIRKS (or the number of max_layers is reached)

Returns

  • smirks_list (list of tuples) – list of tuples in the form (label, smirks)

  • layers (int) – number of layers actually used to specify the set clusters

reduce(max_its=1000, verbose=None)[source]

Reduce the SMIRKS decorators for a number of iterations

Parameters
  • max_its (int) – default = 1000 The specified number of iterations

  • verbose (boolean) – default = None will set the verboseness while running (if None, the current verbose variable will be used)

Returns

final_smirks – list of final smirks patterns after reducing in the form [(label, smirks)]

Return type

list of tuples

types_match_reference(current_types=None)[source]

Determine best match for each parameter with reference types

Parameters

current_types (list of tuples with form [ (label, smirks), ]) –

Returns

type_matches – pair of current and reference labels with the number of fragments that match it

Return type

list of tuples (current_label, reference_label, counts)

chemper.smirksify.print_smirks(smirks_list)[source]

Prints out the provided smirks list in a table like format with label | SMIRKS

Parameters

smirks_list (list of tuples) – list in the form [ (label, SMIRKS), …]