Start

The Start modules define how to determine which calculations to perform before there is enough data available to train a machine learning model.

Available Methods

ExaMol provides a few different start methods, each with a maximum recommended search space size.

Starter

Category

Maximum Search Size

RandomStarter

Fast

100M

KMeansStarter

KMeans

100K

Using a Starter

Simply provide an iterator over the names of molecules to consider:

starter = RandomStarter()
starting_pool = starter.select(['C', 'O', 'N'], 2)  # Will generate two choices

The starter will provide a list of SMILES strings from those that were provided.

Increase the speed of selection by setting the max_to_consider option of the Starter, which will truncate the list of molecules strings at a specific size before running the selection algorithm.