Steer

ExaMol scales to use large supercomputers by managing many tasks together. The logic for when to launch tasks and how to process completed tasks are defined as Colmena “Thinker” classes. ExaMol contains several different Thinkers, which each use different strategies for deploying tasks on a supercomputer.

Available Methods

Each steering strategy is associated with a specific Solution strategy.

Class

Solution

Description

BruteForceThinker

SolutionSpecification

Evaluate all molecules in an initial population

SingleStepThinker

SingleFidelityActiveLearning

Run all recipes for each selected molecule

PipelineThinker

MultiFidelityActiveLearning

Run the next step in a pipeline each time a model is selected

Single Objective Thinker as an Example

The SingleStepThinker is a good example for explaining how Thinkers work in ExaMol.

The strategy for this thinker is three parts:

  1. Never leave nodes on the supercomputer idle

  2. Update the list of selected calculations with new data as quickly as possible

  3. Wait until resources are free until submitting the next calculation.

This strategy is achieved by a series of simple policies, such as:

  • Submit a new quantum chemistry calculation when another completes

  • Begin re-training models as soon as a recipe is complete for any molecule

  • Re-run inference for all molecules as soon as all models finish training

These policy steps are defined as methods of the Thinker marked with a special decorator (see Colmena’s quickstart). For example, the “submit a new quantum chemistry” policy is defined by a pair of methods

class SingleStepThinker(MoleculeThinker):
    ...
    @result_processor(topic='simulation')
    def store_simulation(self, result: Result):
        """Store the output of a simulation"""
        # Trigger a new simulation to start
        self.rec.release()
        ...

    @task_submitter()
    def submit_simulation(self):
        """Submit a simulation task when resources are available"""
        record, suggestion = next(self.task_iterator)  # Returns a molecule record and the suggested computation
        ...

store_simulation, runs when a simulation result completes and starts by marking resources available before updating the database and - if conditions are right - retraining the models. submit_simulation is started as soon as resources are marked as free, keeping the supercomputer occupied.

The other methods manage keeping machine learning models up-to-date and ensuring the task iterator (self.task_iterator) produces the best possible computations to run.