AI enhances CMI’s computer-aided molecular-design software

Visual representations of the descriptors included to train LOGKPREDICT:  A, atom-to-atom connectivity; B, coordination number; C, rotatable bonds;  D, ionic radius; E, strain energy; F, Hancock covalency; G, Hancock electrostatics.
Visual representations of the descriptors included to train LOGKPREDICT: A, atom-to-atom connectivity; B, coordination number; C, rotatable bonds; D, ionic radius; E, strain energy; F, Hancock covalency; G, Hancock electrostatics.

CMI researchers at Ames Laboratory conducted the research for this highlight in collaboration with the Supramolecular Design Institute

Innovation
Created AI/ML software that allows the prediction of metal-ligand complex formation constants from physico-chemical descriptors.

Achievements
Novel LOGKPREDICT code interfaced with existing structure-based design code HostDesigner to provide enhanced scoring of designed ligand architectures.

  • LOGKPREDICT, open-source software, on github.com/

  • HostDesigner, open-source software, on sourceforge.net/

Significance and Impact
Designed ligands can now be rank-ordered with respect to selectivity, log(KM1/KM2), facilitating the design of improved metal ion separating agents prior to synthesis and testing.

Hub Goal Addressed
Increase speed of discovery and integration. Highly selective separation from complex sources allowing for greater supply diversification.

Representative graph of predicted vs. experimental log K values of over 1.6K data points. Ten trials produce an average Pearson correlation of 0.89 (range 0.87–0.91), with mean log K error = 0.10, mean absolute log K error = 0.86. For each trial, the data are randomly split 2:1 between training and test sets, respectively.  
Representative graph of predicted vs. experimental log K values of over 1.6K data points. Ten trials produce an average Pearson correlation of 0.89 (range 0.87–0.91), with mean log K error = 0.10, mean absolute log K error = 0.86. For each trial, the data are randomly split 2:1 between training and test sets, respectively.