examol.store

Tools related to storing then retrieving data about molecules

examol.store.db

Tools for interfacing with data stores

examol.store.db.base

Base classes for storage utilities

class examol.store.db.base.MoleculeStore[source]

Bases: AbstractContextManager, ABC

Base class defining how to interface with a dataset of molecule records.

Data stores provide the ability to persist the data collected by ExaMol to disk during a run. The update_record() call need not immediately persist the data but should ensure that the data is stored on disk eventually. In fact, it is actually better for the update operation to not block until the resulting write has completed.

Stores do not need support concurrent access from multiple client, which is why this documentation avoids the word “database.”

export_records(path: Path)[source]

Save a current copy of the database to disk as line-delimited JSON

Parameters:

path – Path in which to save all data. Use a “.json.gz”

get_or_make_record(mol_string: str) MoleculeRecord[source]

Either the existing record for a molecule or make a new one

Parameters:

mol_string – String describing a molecule (e.g., SMILES string)

Returns:

Record

iterate_over_records() Iterable[MoleculeRecord][source]

Iterate over all records in data

Yields:

A single record

update_record(record: MoleculeRecord)[source]

Update a single record

Parameters:

record – Record to be updated

update_records(records: Iterable[MoleculeRecord])[source]

Update many records at once

Parameters:

records – Iterator over records to be stored

examol.store.db.memory

Stores that keep the entire dataset in memory

class examol.store.db.memory.InMemoryStore(path: Path | None, write_freq: float = 10.0)[source]

Bases: MoleculeStore

Store all molecule records in memory, write to disk as a single file

The class will start checkpointing as soon as any record is updated but no more frequently than write_freq

Parameters:
  • path – Path from which to read data. Must be a JSON file, can be compressed with GZIP. Set to None if you do not want data to be stored

  • write_freq – Minimum time between writing checkpoints

iterate_over_records() Iterable[MoleculeRecord][source]

Iterate over all records in data

Yields:

A single record

update_record(record: MoleculeRecord)[source]

Update a single record

Parameters:

record – Record to be updated

examol.store.models

Data models used for molecular data

class examol.store.models.Conformer(*, xyz: str, xyz_hash: str, date_created: datetime, source: str | None = None, config_name: str | None = None, charge: int, energies: list[EnergyEvaluation] = None)[source]

Bases: BaseModel

Describes a single conformer of a molecule

add_energy(sim_result: SimResult) bool[source]

Add the energy from a simulation result

Parameters:

sim_result – Result to be added

property atoms: Atoms
charge: int

Charge used when relaxing the structure

config_name: str | None

Configuration used to relax the structure, if applicable

date_created: datetime

Date this conformer was inserted

energies: list[EnergyEvaluation]

List of energies for this structure

classmethod from_simulation_result(sim_result: SimResult, source: str = 'relaxation') Conformer[source]

Create a new object from a simulation

Parameters:
  • sim_result – Simulation result

  • source – How this conformer was determined

Returns:

An initialized conformer record which includes energies

classmethod from_xyz(xyz: str, **kwargs)[source]

Create a new object from a XYZ-format object

Parameters:

xyz – XYZ-format description of the molecule

Returns:

An initialized conformer object

get_energy(config_name: str, charge: int, solvent: str | None) float[source]

Get the energy for a certain level

Parameters:
  • config_name – Name of the compute configuration

  • charge – Charge of the molecule

  • solvent – Solvent in which the molecule is dissolved

Returns:

Energy of the target conformer

Raises:

NoSuchConformer – If there is no such energy for this conformer

get_energy_index(config_name: str, charge: int, solvent: str | None) int | None[source]

Get the index of the record for a certain level of energy

Parameters:
  • config_name – Name of the compute configuration

  • charge – Charge of the molecule

  • solvent – Solvent in which the molecule is dissolved

Returns:

Index of the record, if available, or None, if not.

source: str | None

Method used to generate this structure (e.g., via relaxation)

xyz: str

XYZ-format description of the atomic coordinates

xyz_hash: str

MDF hash of the XYZ coordinates

class examol.store.models.EnergyEvaluation(*, energy: float, config_name: str, charge: int, solvent: str | None = None, completed: datetime = None)[source]

Bases: BaseModel

Energy of a conformer under a certain condition

charge: int

Charge used when computing the energy

completed: datetime

When this energy computation was added

config_name: str

Configuration used to compute the energy

energy: float

Energy of the conformer (eV)

solvent: str | None

Solvent used, if any

class examol.store.models.Identifiers(*, smiles: str, inchi: str, pubchem_id: int | None = None)[source]

Bases: BaseModel

IDs known for a molecule

inchi: str

The InChI string

pubchem_id: int | None

PubChem ID, if known

smiles: str

A SMILES string

exception examol.store.models.MissingData(config_name: str = Ellipsis, charge: int = Ellipsis, solvent: str | None = Ellipsis)[source]

Bases: ValueError

No conformer or energy with the desired settings was found

charge: int = Ellipsis

Charge used when computing the energy

config_name: str = Ellipsis

Configuration used to compute the energy

solvent: str | None = Ellipsis

Solvent used, if any

class examol.store.models.MoleculeRecord(*, key: ConstrainedStrValue, identifier: Identifiers, names: list[str] = None, subsets: list[str] = None, conformers: list[Conformer] = None, properties: dict[str, dict[str, float]] = None)[source]

Bases: BaseModel

Defines whatever we know about a molecule

add_energies(result: SimResult, opt_steps: Collection[SimResult] = (), match_tol: float = 0.001) bool[source]

Add a new set of energies to a structure

Will add a new conformer if the structure does not yet exist

If provided, will match the energies of any materials within the optimization steps

Parameters:
  • result – Energy computation to be added

  • opt_steps – Optimization steps, if available

  • match_tol – Maximum absolute difference between XYZ coordinates to match

Returns:

Whether a new conformer was added

conformers: list[Conformer]

All known conformers for this molecule

find_lowest_conformer(config_name: str, charge: int, solvent: str | None, optimized_only: bool = True) tuple[Conformer, float][source]

Get the energy of the lowest-energy conformer of a molecule in a certain state

Parameters:
  • config_name – Name of the compute configuration

  • charge – Charge of the molecule

  • solvent – Solvent in which the molecule is dissolved

  • optimized_only – Only match conformers which were optimized with the specified configuration and charge

Returns:

  • Lowest-energy conformer

  • Energy of the structure (eV)

Raises:

NoSuchConformer – If we lack a conformer with these settings

classmethod from_identifier(mol_string: str)[source]

Parse the molecule from either the SMILES or InChI string

Parameters:

mol_string – Molecule to parse

Returns:

Empty record for this molecule

identifier: Identifiers

Collection of identifiers which define the molecule

key: str

InChI key

names: list[str]

Names this molecule is known by

properties: dict[str, dict[str, float]]

Properties available for the molecule

subsets: list[str]

List of subsets this molecule is part of

examol.store.recipes

Tools for computing the properties of molecules from their record

class examol.store.recipes.PropertyRecipe(name: str, level: str)[source]

Bases: object

Compute the property given a MoleculeRecord

Creating a New Recipe

Define a recipe by implementing three operations:

  1. __init__(): Take a users options for the recipe (e.g., what level of accuracy to use)

    then define a name and level for the recipe. Pass the name and level to the superclass’s constructor. It is better to avoid using underscores when creating the name as underscores are used in the names of simulation configurations.

  2. recipe(): Return a mapping of the different types of geometries defined

    using RequiredGeometry and the energies which must be computed for each geometry using RequiredEnergy.

  3. compute_property(): Compute the property using the record and raise

    either a ValueError, KeyError, or AssertionError if the record lacks the required information.

  4. from_name(): Restore a recipe from its name and level.

compute_property(record: MoleculeRecord) float[source]

Compute the property

Parameters:

record – Data about the molecule

Returns:

Property value

classmethod from_name(name: str, level: str) PropertyRecipe[source]

Generate a recipe from the name

Parameters:
  • name – Name of the property

  • level – Level at which it is computed

lookup(record: MoleculeRecord, recompute: bool = False) float | None[source]

Lookup the value of a property from a record

Parameters:
  • record – Record to be evaluated

  • recompute – Whether we should attempt to recompute the property beforehand

Returns:

Value of the property, if available, or None if not

property recipe: dict[RequiredGeometry, list[RequiredEnergy]]

List of the geometries required for this recipe and the energies which must be computed for them

suggest_computations(record: MoleculeRecord) list[SimulationRequest][source]

Generate a list of computations that should be performed next on a molecule

The list of computations may not be sufficient to complete a recipe. For example, you may need to first relax a structure and then compute the energy of the relaxed structure under different conditions.

Parameters:

record – Data about the molecule

Returns:

List of computations to perform

update_record(record: MoleculeRecord) float[source]

Compute a property and update the record

Parameters:

record – Record to be updated

Returns:

Value of the property being computed

class examol.store.recipes.RedoxEnergy(charge: int, energy_config: str, vertical: bool = False, solvent: str | None = None)[source]

Bases: PropertyRecipe

Compute the redox energy for a molecule

The level is named by the configuration used to compute the energy, whether a solvent was included, and whether we are computing the vertical or adiabatic energy.

Parameters:
  • charge – Amount the charge of the molecule should change by

  • energy_config – Configuration used to compute the energy

  • solvent – Solvent in which molecule is dissolved, if any

compute_property(record: MoleculeRecord) float[source]

Compute the property

Parameters:

record – Data about the molecule

Returns:

Property value

classmethod from_name(name: str, level: str) RedoxEnergy[source]

Generate a recipe from the name

Parameters:
  • name – Name of the property

  • level – Level at which it is computed

property recipe: dict[RequiredGeometry, list[RequiredEnergy]]

List of the geometries required for this recipe and the energies which must be computed for them

class examol.store.recipes.RequiredEnergy(config_name: str = Ellipsis, charge: int = Ellipsis, solvent: str | None = None)[source]

Bases: object

Energy computation level required for a geometry

charge: int = Ellipsis

Charge on the molecule

config_name: str = Ellipsis

Level of computation required for the energy

solvent: str | None = None

Name of solvent, if any

class examol.store.recipes.RequiredGeometry(config_name: str = Ellipsis, charge: int = Ellipsis)[source]

Bases: object

Geometry level required for a recipe

charge: int = Ellipsis

Charge on the molecule used during optimization

config_name: str = Ellipsis

Level of computation required for this geometry

class examol.store.recipes.SimulationRequest(xyz: str, optimize: bool = Ellipsis, config_name: str = Ellipsis, charge: int = Ellipsis, solvent: str | None = Ellipsis)[source]

Bases: object

Request for a specific simulation type

charge: int = Ellipsis

Charge on the molecule

config_name: str = Ellipsis

Name of the computation configuration

optimize: bool = Ellipsis

Whether to perform an optimization

solvent: str | None = Ellipsis

Name of solvent, if any

xyz: str

XYZ structure to use as the starting point

class examol.store.recipes.SolvationEnergy(config_name: str, solvent: str)[source]

Bases: PropertyRecipe

Compute the solvation energy in kcal/mol

Parameters:
  • config_name – Name of the configuration used to compute energy

  • solvent – Target solvent

compute_property(record: MoleculeRecord) float[source]

Compute the property

Parameters:

record – Data about the molecule

Returns:

Property value

classmethod from_name(name: str, level: str) SolvationEnergy[source]

Generate a recipe from the name

Parameters:
  • name – Name of the property

  • level – Level at which it is computed

property recipe: dict[RequiredGeometry, list[RequiredEnergy]]

List of the geometries required for this recipe and the energies which must be computed for them