Score

The Score module defines interfaces for running machine learning (ML) tasks on distributed systems. Each implementation of Scorer provides tools for sending models to remote compute nodes, preparing molecular data for training or inference, and functions for executing training and inference on remote nodes.

Available Interfaces

ExaMol provides interfaces to several libraries which support ML on molecular property data.

Interface

Model Types

Description

RDKitScorer

Conventional ML

Models which use fingerprints computed from RDKit as inputs to scikit-learn Pipelines.

NFPScorer

MPNNs

Neural networks based on the Neural Fingerprints (nfp) library, which is backed by Tensorflow

Modules for each type of learning algorithms provide helper functions to generate models. For example, make_knn_model() creates a KNN model.

Using Scorers

Scorers separate pre-processing data, transmitting models, and running ML tasks into separate steps so that they can be distributed across supercomputing resources.

Consider model training as an example. Start by creating a scorer, a model it will train, and the recipe describing the computations to be learned.

scorer = RDKitScorer()
recipe = RedoxEnergy(charge=1, config_name='xtb')
model = make_knn_model()

Training the model requires first transforming the available molecule data (as molecule data records) into inputs and outputs compatible with the scorer.

outputs = model.transform_outputs(records, recipe)  # Outputs are specific to a recipe
inputs = model.transform_inputs(records)  # Inputs are not

Then, convert the model into a form that can be transmitted across nodes

model_msg = model.prepare_message(model, training=True)

ExaMol is now ready to run training on a remote node, and will use the output of training to update the local copy of the model:

update_msg = scorer.retrain(model_msg, inputs, outputs)  # Can be run remotely
model = scorer.update(model, update_msg)

Multi-fidelity Learning

Some Scorer classes support using properties computed at lower levels of accuracy to improve performance. The strategies employed by each Scorer may be different, but all have the same interface.

Use the multi-fidelity capability of a Scorer by providing values from lower levels of fidelity when training or running inference.

from examol.score.utils.multifi import collect_outputs
fidelities = [RedoxEnergy(1, 'low'), RedoxEnergy(1, 'medium'), RedoxEnergy(1, 'high')]

# Get the inputs and outputs, as normal
inputs = scorer.transform_inputs(records)
outputs = scorer.transform_outputs(records, fidelities[-1])  # Train using the highest level

# Pass the low-fidelity results to scoring and inference
lower_fidelities = collect_outputs(records, fidelities[:-1])
scorer.train(model_msg, inputs, outputs, lower_fidelties=lower_fidelities)
...
scorer.score(model_msg, inputs, lower_fidelties=lower_fidelities)