examol.select¶
Collection of tools used to identify which computations to perform next
examol.select.base¶
Interfaces for selector classes
- class examol.select.base.RankingSelector(to_select: int, maximize: bool | Sequence[bool] = True)[source]¶
Bases:
Selector
Base class where we assign an independent score to each possibility.
Implementations should assume that the goal is maximization because this abstract class negates the samples for objective is to minimize.
- Parameters:
to_select – How many computations to select per batch
maximize – Whether to select entries with high or low values of the samples. Provide either a single value if maximizing or minimizing all objectives, or a list for whether to maximize each objectives.
- class examol.select.base.Selector(to_select: int)[source]¶
Bases:
object
Base class for selection algorithms
Using a Selector
Selectors function in two phases: gathering and dispensing.
Selectors are in the gathering phase when first created. Add potential computations in batches with
add_possibilities()
, which takes a list of keys describing the computations and a distribution of probable scores (e.g., predictions from different models in an ensemble) for each computation. Sample arrays are 3D and shapednum_recipes x num_samples x num_models
The dispensing phase starts by calling
dispense()
.dispense
generates a selected computation from the list of keys acquired during gathering phase paired with a score. Selections are generated from highest to lowest priority.Creating a Selector
You must implement three operations:
start_gathering()
, which is called at the beginning of a gathering phase and must clear state from the previous selection round.add_possibilities()
updates the state of a selection to account for a new batch of computations. For example, you could update an ranked list of best-scored computations.dispense()
generates a list ofto_select
in ranked order from best to worst
- add_possibilities(keys: list, samples: ndarray, **kwargs)[source]¶
Add potential options to be selected
- Parameters:
keys – Labels by which to identify the records being evaluated
samples – A distribution of scores for each record. Expects a 3-dimensional array of shape (num recipes) x (num records) x (num models)
- dispense() Iterator[tuple[object, float]] [source]¶
Dispense selected computations from highest- to least-rated.
- Yields:
A pair of “selected computation” (as identified by the keys provided originally) and a score.
- update(database: MoleculeStore, recipes: Sequence[PropertyRecipe])[source]¶
Update the selector given the current database
- Parameters:
database – Known molecules
recipes – Recipe being optimized
examol.select.baseline¶
Useful baseline strategies
examol.select.bayes¶
Acquisition functions derived from Bayesian optimization
- class examol.select.bayes.ExpectedImprovement(to_select: int, maximize: bool, epsilon: float = 0)[source]¶
Bases:
RankingSelector
Rank entries according to their expected improvement
- Parameters:
to_select – How many computations to select per batch
maximize – Whether to select entries with the highest score
epsilon – Parameter which controls degree of exploration
- update(database: MoleculeStore, recipes: Sequence[PropertyRecipe])[source]¶
Update the selector given the current database
- Parameters:
database – Known molecules
recipes – Recipe being optimized
examol.select.botorch¶
Employ the acquisition functions from BOTorch
- class examol.select.botorch.BOTorchSequentialSelector(acq_function_type: type[botorch.acquisition.AcquisitionFunction], acq_options: dict[str, object], to_select: int, acq_options_updater: Callable[[BOTorchSequentialSelector, ndarray], dict] | None = None, maximize: bool = True)[source]¶
Bases:
RankingSelector
Use an acquisition function from BOTorch to score candidates assuming a pool size of \(q=1\)
Provide the acquisition function type and any options needed to configure it. Options can be updated by supplying a function which updates them based on the properties of molecules which have been evaluated so far.
For example, Expected Improvement which updates the maximum observed value would be
def update_fn(selector: 'BOTorchSequentialSelector', obs: np.ndarray) -> dict: return {'best_f': max(obs) if selector.maximize else min(obs)} selector = BOTorchSequentialSelector(qExpectedImprovement, acq_options={'best_f': 0.5}, acq_options_updater=update_fn, to_select=1)
- Parameters:
acq_function_type – Class of the acquisition function
acq_options – Dictionary of options passed to the acquisition function maker
acq_options_updater – Function which takes the current selector and an array of observations of shape (num molecules) x (num recipes)
maximize – Whether to maximize or minimize the objectives
to_select – Number of top candidates to select each round
- update(database: MoleculeStore, recipes: Sequence[PropertyRecipe])[source]¶
Update the selector given the current database
- Parameters:
database – Known molecules
recipes – Recipe being optimized
- class examol.select.botorch.EHVISelector(to_select: int, maximize: bool | Sequence[bool] = True)[source]¶
Bases:
BOTorchSequentialSelector
Rank entries based on the Expected Hypervolume Improvement (EVHI)
EVHI is a multi-objective optimization scores which measures how much a new point will expand the Pareto surface. We use the Monte Carlo implementation of EVHI of Daulton et al., but do not yet support the algorithms batch-aware implementation.
Constructing the Pareto surface requires the definition of a reference point where farther from the reference point is better. We use the minimum value of objectives which are being maximized and the maximum value of those being minimized.
- Parameters:
maximize – Whether to maximize or minimize the objectives
to_select – Number of top candidates to select each round