examol.steer¶

Collection of policies which interleave simulation and machine learning tasks.

Policies are defined as Colmena Thinker classes.

examol.steer.base¶

Base class that defines core routines used across many steering policies

class examol.steer.base.MoleculeThinker(queues: ColmenaQueues, rec: ResourceCounter, run_dir: Path, recipes: Sequence[PropertyRecipe], solution: SolutionSpecification, search_space: list[Path | str], database: MoleculeStore, pool: ProcessPoolExecutor)[source]¶

Bases: BaseThinker

Base for a thinker which performs molecular design

Parameters:

queues – Queues used to communicate with the task server
rec – Counter used to track availability of different resources
run_dir – Directory in which to store results
recipes – List of recipes to compute
solution – Description of how to solve the problem
database – List of molecule records
search_space – Lists of molecules to be evaluated as a list of “.smi” or “.json” files
num_workers – Number of workers to use locally for the thinker

database: MoleculeStore¶: Access to the data available to the thinker

iterate_over_search_space(only_smiles: bool = False) → Iterator[MoleculeRecord | str][source]¶

Function to produce a stream of molecules from the input files

Parameters:: only_smiles – Whether to return only the SMILES string rather than the full record
Yields:: A MoleculeRecord for each molecule in the search space or just the SMILES String

store_simulation(result: Result)[source]¶: Store the output of a simulation

submit_simulation()[source]¶: Submit a simulation task when resources are available

task_iterator() → Iterator[tuple[MoleculeRecord, Iterable[PropertyRecipe], SimulationRequest]][source]¶

Iterate over the next tasks in the task queue

Yields:

Molecule being processed
Recipes being computed
Simulation to execute

task_queue: list[tuple[str, float]]¶: List of tasks to run. Each entry is a SMILES string and score, and they are arranged descending in priority

task_queue_lock: Condition¶: Lock used to control access to task_queue

examol.steer.baseline¶

Baseline methods for steering a molecular design campaign

class examol.steer.baseline.BruteForceThinker(queues: ColmenaQueues, run_dir: Path, recipes: Sequence[PropertyRecipe], solution: SolutionSpecification, search_space: list[Path | str], database: MoleculeStore, pool: ProcessPoolExecutor, num_workers: int = 1, overselection: float = 0)[source]¶

Bases: MoleculeThinker

Run the selection of molecules selected in the beginning

Parameters:

queues – Queues used to communicate with the task server
run_dir – Directory in which to store results
recipes – List of recipes to compute
solution – Description of how to solve the problem
database – List of molecule records
search_space – Lists of molecules to be evaluated as a list of “.smi” or “.json” files
num_workers – Number of simulations to run in parallel
overselection – Additional fraction molecules to select above the number requested by the user. Number of molecules will be solution.num_to_run * overselection. Used to ensure target number of molecules are evaluated even if some fail.

startup()[source]¶: Pre-populate the database, if needed.

examol.steer.single¶

Single-objective and single-fidelity implementation of active learning. As easy as we get

class examol.steer.single.SingleStepThinker(queues: ColmenaQueues, run_dir: Path, recipes: Sequence[PropertyRecipe], solution: SingleFidelityActiveLearning, search_space: list[Path | str], database: MoleculeStore, pool: ProcessPoolExecutor, num_workers: int = 2, inference_chunk_size: int = 10000)[source]¶

Bases: MoleculeThinker

A thinker which submits all computations needed to evaluate a molecule whenever it is selected

Parameters:

queues – Queues used to communicate with the task server
run_dir – Directory in which to store logs, etc.
recipes – Recipes used to compute the target properties
database – Connection to the store of molecular data
solution – Settings related to tools used to solve the problem (e.g., active learning strategy)
search_space – Search space of molecules. Provided as a list of paths to “.smi” files
num_workers – Number of simulation tasks to run in parallel
inference_chunk_size – Number of molecules to run inference on per task

count_training_size(recipe: PropertyRecipe) → int[source]¶

Count the number of entries available for training each recipe

Parameters:: recipe – Recipe being assessed
Returns:: Number of records for which this property is defined

get_additional_training_information(train_set: list[MoleculeRecord], recipe: PropertyRecipe) → dict[str, object][source]¶

Determine any additional information to be provided during training

An example could be to gather low-fidelity data to use to augment the training process

Parameters:

train_set – Training set for the model
recipe – Recipe being trained

Returns:

Additional options

property inference_store: Store | None¶: Proxystore used for inference tasks

property num_models: int¶: Number of models being trained by this class

retrain()[source]¶: Retrain all models

run_inference()[source]¶: Store inference results then update the task list

scorer: Scorer¶: Class used to communicate data and models to distributed workers

search_space_dir: Path¶: Cache directory for search space

search_space_inputs: list[list[object]]¶: Inputs (or proxies of inputs) to the machine learning models for each molecule in the search space

search_space_smiles: list[list[str]]¶: SMILES strings of molecules in the search space

solution: SingleFidelityActiveLearning¶

startup()[source]¶: Pre-populate the database, if needed.

submit_inference() → tuple[list[list[str]], ndarray, list[ndarray]][source]¶

Submit all molecules to be evaluated, return placeholders for their outputs

Inference tasks are submitted with a few bits of metadata

recipe_id: Index of the recipe being evaluated
model_id: Index of the model being evaluated
chunk_id: Index of the chunk of molecules
chunk_size: Number of molecules in chunks being evaluated

Returns:

Smiles strings of the molecules being evaluated
Boolean array marking if inference task is done n_chunks x recipes x ensemble_size
List of arrays in which to store inference results a total of n_chunks arrays of size recipes x batch_size x models

examol.steer.multifi¶

Scheduling strategies for multi-fidelity design campaigns

class examol.steer.multifi.PipelineThinker(queues: ColmenaQueues, run_dir: Path, recipes: Sequence[PropertyRecipe], database: MoleculeStore, solution: MultiFidelityActiveLearning, search_space: list[Path | str], pool: ProcessPoolExecutor, num_workers: int = 2, inference_chunk_size: int = 10000)[source]¶

Bases: SingleStepThinker

Thinker which runs each level of fidelity in incremental steps

See MultiFidelityActiveLearning for a description of the adjustable parameters.

already_in_db: set[str]¶: InChI keys of molecules from the search space which are already in the database

get_additional_training_information(train_set: list[MoleculeRecord], recipe: PropertyRecipe) → dict[str, object][source]¶

Determine any additional information to be provided during training

An example could be to gather low-fidelity data to use to augment the training process

Parameters:

train_set – Training set for the model
recipe – Recipe being trained

Returns:

Additional options

get_level(smiles: str) → int[source]¶

Get the current step number of a molecule

Parameters:: smiles – SMILES string of molecule in question
Returns:: Step level (0 means no data)

get_relevant_database_records() → set[str][source]¶

Get only the entries from the database which are in the search space

Returns:: InChI keys from the database which are in the search space

property num_levels[source]¶

solution: MultiFidelityActiveLearning¶

property steps: Sequence[Sequence[PropertyRecipe]][source]¶

submit_inference() → tuple[list[list[str]], ndarray, list[ndarray]][source]¶

Submit all molecules to be evaluated, return placeholders for their outputs

Inference tasks are submitted with a few bits of metadata

recipe_id: Index of the recipe being evaluated
model_id: Index of the model being evaluated
chunk_id: Index of the chunk of molecules
chunk_size: Number of molecules in chunks being evaluated

Returns:

Smiles strings of the molecules being evaluated
Boolean array marking if inference task is done n_chunks x recipes x ensemble_size
List of arrays in which to store inference results a total of n_chunks arrays of size recipes x batch_size x models