Score ===== The Score module defines interfaces for running machine learning (ML) tasks on distributed systems. Each implementation of :class:`~examol.score.base.Scorer` provides tools for sending models to remote compute nodes, preparing molecular data for training or inference, and functions for executing training and inference on remote nodes. Available Interfaces -------------------- ExaMol provides interfaces to several libraries which support ML on molecular property data. .. list-table:: :header-rows: 1 * - Interface - Model Types - Description * - :class:`~examol.score.rdkit.RDKitScorer` - Conventional ML - Models which use fingerprints computed from RDKit as inputs to scikit-learn Pipelines. * - :class:`~examol.score.nfp.NFPScorer` - MPNNs - Neural networks based on the `Neural Fingerprints (nfp) library `_, which is backed by Tensorflow Modules for each type of learning algorithms provide helper functions to generate models. For example, :py:meth:`~examol.score.rdkit.make_knn_model` creates a KNN model. Using Scorers ------------- Scorers separate pre-processing data, transmitting models, and running ML tasks into separate steps so that they can be distributed across supercomputing resources. Consider model training as an example. Start by creating a scorer, a model it will train, and the recipe describing the computations to be learned. .. code-block:: python scorer = RDKitScorer() recipe = RedoxEnergy(charge=1, config_name='xtb') model = make_knn_model() Training the model requires first transforming the available molecule data (as `molecule data records `_) into inputs and outputs compatible with the scorer. .. code-block:: python outputs = model.transform_outputs(records, recipe) # Outputs are specific to a recipe inputs = model.transform_inputs(records) # Inputs are not Then, convert the model into a form that can be transmitted across nodes .. code-block:: python model_msg = model.prepare_message(model, training=True) ExaMol is now ready to run training on a remote node, and will use the output of training to update the local copy of the model: .. code-block:: python update_msg = scorer.retrain(model_msg, inputs, outputs) # Can be run remotely model = scorer.update(model, update_msg) Multi-fidelity Learning ----------------------- Some Scorer classes support using properties computed at lower levels of accuracy to improve performance. The strategies employed by each Scorer may be different, but all have the same interface. Use the multi-fidelity capability of a Scorer by providing values from lower levels of fidelity when training or running inference. .. code-block:: python from examol.score.utils.multifi import collect_outputs fidelities = [RedoxEnergy(1, 'low'), RedoxEnergy(1, 'medium'), RedoxEnergy(1, 'high')] # Get the inputs and outputs, as normal inputs = scorer.transform_inputs(records) outputs = scorer.transform_outputs(records, fidelities[-1]) # Train using the highest level # Pass the low-fidelity results to scoring and inference lower_fidelities = collect_outputs(records, fidelities[:-1]) scorer.train(model_msg, inputs, outputs, lower_fidelties=lower_fidelities) ... scorer.score(model_msg, inputs, lower_fidelties=lower_fidelities)