examol.select

Collection of tools used to identify which computations to perform next

examol.select.base

Interfaces for selector classes

class examol.select.base.RankingSelector(to_select: int, maximize: bool | Sequence[bool] = True)[source]

Bases: Selector

Base class where we assign an independent score to each possibility.

Implementations should assume that the goal is maximization because this abstract class negates the samples for objective is to minimize.

Parameters:
  • to_select – How many computations to select per batch

  • maximize – Whether to select entries with high or low values of the samples. Provide either a single value if maximizing or minimizing all objectives, or a list for whether to maximize each objectives.

start_gathering()[source]

Prepare to gather new batches potential computations

class examol.select.base.Selector(to_select: int)[source]

Bases: object

Base class for selection algorithms

Using a Selector

Selectors function in two phases: gathering and dispensing.

Selectors are in the gathering phase when first created. Add potential computations in batches with add_possibilities(), which takes a list of keys describing the computations and a distribution of probable scores (e.g., predictions from different models in an ensemble) for each computation. Sample arrays are 3D and shaped num_recipes x num_samples x num_models

The dispensing phase starts by calling dispense(). dispense generates a selected computation from the list of keys acquired during gathering phase paired with a score. Selections are generated from highest to lowest priority.

Creating a Selector

You must implement three operations:

  • start_gathering(), which is called at the beginning of a gathering phase and must clear state from the previous selection round.

  • add_possibilities() updates the state of a selection to account for a new batch of computations. For example, you could update an ranked list of best-scored computations.

  • dispense() generates a list of to_select in ranked order from best to worst

add_possibilities(keys: list, samples: ndarray, **kwargs)[source]

Add potential options to be selected

Parameters:
  • keys – Labels by which to identify the records being evaluated

  • samples – A distribution of scores for each record. Expects a 3-dimensional array of shape (num recipes) x (num records) x (num models)

dispense() Iterator[tuple[object, float]][source]

Dispense selected computations from highest- to least-rated.

Yields:

A pair of “selected computation” (as identified by the keys provided originally) and a score.

gathering: bool

Whether the selector is waiting to accept more possibilities.

multiobjective: bool = False

Whether the selector supports multi-objective optimization

start_gathering()[source]

Prepare to gather new batches potential computations

to_select: int

Number of computations to select

update(database: MoleculeStore, recipes: Sequence[PropertyRecipe])[source]

Update the selector given the current database

Parameters:
  • database – Known molecules

  • recipes – Recipe being optimized

examol.select.baseline

Useful baseline strategies

class examol.select.baseline.GreedySelector(to_select: int, maximize: bool | Sequence[bool] = True)[source]

Bases: RankingSelector

Select computations which are rated the best without any regard to model uncertainty

class examol.select.baseline.RandomSelector(to_select: int)[source]

Bases: Selector

Select which computations to perform at random

multiobjective: bool = True

Whether the selector supports multi-objective optimization

start_gathering()[source]

Prepare to gather new batches potential computations

examol.select.bayes

Acquisition functions derived from Bayesian optimization

class examol.select.bayes.ExpectedImprovement(to_select: int, maximize: bool, epsilon: float = 0)[source]

Bases: RankingSelector

Rank entries according to their expected improvement

Parameters:
  • to_select – How many computations to select per batch

  • maximize – Whether to select entries with the highest score

  • epsilon – Parameter which controls degree of exploration

update(database: MoleculeStore, recipes: Sequence[PropertyRecipe])[source]

Update the selector given the current database

Parameters:
  • database – Known molecules

  • recipes – Recipe being optimized

examol.select.botorch

Employ the acquisition functions from BOTorch

class examol.select.botorch.BOTorchSequentialSelector(acq_function_type: type[botorch.acquisition.AcquisitionFunction], acq_options: dict[str, object], to_select: int, acq_options_updater: Callable[[BOTorchSequentialSelector, ndarray], dict] | None = None, maximize: bool = True)[source]

Bases: RankingSelector

Use an acquisition function from BOTorch to score candidates assuming a pool size of \(q=1\)

Provide the acquisition function type and any options needed to configure it. Options can be updated by supplying a function which updates them based on the properties of molecules which have been evaluated so far.

For example, Expected Improvement which updates the maximum observed value would be

def update_fn(selector: 'BOTorchSequentialSelector', obs: np.ndarray) -> dict:
    return {'best_f': max(obs) if selector.maximize else min(obs)}

selector = BOTorchSequentialSelector(qExpectedImprovement,
                                     acq_options={'best_f': 0.5},
                                     acq_options_updater=update_fn,
                                     to_select=1)
Parameters:
  • acq_function_type – Class of the acquisition function

  • acq_options – Dictionary of options passed to the acquisition function maker

  • acq_options_updater – Function which takes the current selector and an array of observations of shape (num molecules) x (num recipes)

  • maximize – Whether to maximize or minimize the objectives

  • to_select – Number of top candidates to select each round

multiobjective: bool = True

Whether the selector supports multi-objective optimization

update(database: MoleculeStore, recipes: Sequence[PropertyRecipe])[source]

Update the selector given the current database

Parameters:
  • database – Known molecules

  • recipes – Recipe being optimized

class examol.select.botorch.EHVISelector(to_select: int, maximize: bool | Sequence[bool] = True)[source]

Bases: BOTorchSequentialSelector

Rank entries based on the Expected Hypervolume Improvement (EVHI)

EVHI is a multi-objective optimization scores which measures how much a new point will expand the Pareto surface. We use the Monte Carlo implementation of EVHI of Daulton et al., but do not yet support the algorithms batch-aware implementation.

Constructing the Pareto surface requires the definition of a reference point where farther from the reference point is better. We use the minimum value of objectives which are being maximized and the maximum value of those being minimized.

Parameters:
  • maximize – Whether to maximize or minimize the objectives

  • to_select – Number of top candidates to select each round

class examol.select.botorch.EnsembleCovarianceModel(*args: Any, **kwargs: Any)[source]

Bases: Model

Model which generates a multivariate Gaussian distribution given samples from an ensemble of models

property num_outputs: int
posterior(X: torch.Tensor, output_indices: List[int] | None = None, observation_noise: bool = False, posterior_transform: botorch.acquisition.objective.PosteriorTransform | None = None, **kwargs: Any) botorch.posteriors.Posterior[source]