Score¶
The Score module defines interfaces for running machine learning (ML) tasks on distributed systems.
Each implementation of Scorer
provides tools for sending models to
remote compute nodes,
preparing molecular data for training or inference,
and functions for executing training and inference on remote nodes.
Available Interfaces¶
ExaMol provides interfaces to several libraries which support ML on molecular property data.
Interface |
Model Types |
Description |
---|---|---|
|
Conventional ML |
Models which use fingerprints computed from RDKit as inputs to scikit-learn Pipelines. |
MPNNs |
Neural networks based on the Neural Fingerprints (nfp) library, which is backed by Tensorflow |
Modules for each type of learning algorithms provide helper functions to generate models.
For example, make_knn_model()
creates a KNN model.
Using Scorers¶
Scorers separate pre-processing data, transmitting models, and running ML tasks into separate steps so that they can be distributed across supercomputing resources.
Consider model training as an example. Start by creating a scorer, a model it will train, and the recipe describing the computations to be learned.
scorer = RDKitScorer()
recipe = RedoxEnergy(charge=1, config_name='xtb')
model = make_knn_model()
Training the model requires first transforming the available molecule data (as molecule data records) into inputs and outputs compatible with the scorer.
outputs = model.transform_outputs(records, recipe) # Outputs are specific to a recipe
inputs = model.transform_inputs(records) # Inputs are not
Then, convert the model into a form that can be transmitted across nodes
model_msg = model.prepare_message(model, training=True)
ExaMol is now ready to run training on a remote node, and will use the output of training to update the local copy of the model:
update_msg = scorer.retrain(model_msg, inputs, outputs) # Can be run remotely
model = scorer.update(model, update_msg)
Multi-fidelity Learning¶
Some Scorer classes support using properties computed at lower levels of accuracy to improve performance. The strategies employed by each Scorer may be different, but all have the same interface.
Use the multi-fidelity capability of a Scorer by providing values from lower levels of fidelity when training or running inference.
from examol.score.utils.multifi import collect_outputs
fidelities = [RedoxEnergy(1, 'low'), RedoxEnergy(1, 'medium'), RedoxEnergy(1, 'high')]
# Get the inputs and outputs, as normal
inputs = scorer.transform_inputs(records)
outputs = scorer.transform_outputs(records, fidelities[-1]) # Train using the highest level
# Pass the low-fidelity results to scoring and inference
lower_fidelities = collect_outputs(records, fidelities[:-1])
scorer.train(model_msg, inputs, outputs, lower_fidelties=lower_fidelities)
...
scorer.score(model_msg, inputs, lower_fidelties=lower_fidelities)