Quantifying the chemical beauty of drugs

Drug-likeness is a key consideration when selecting compounds during the early stages of drug discovery. However, evaluation of drug-likeness in absolute terms does not reflect adequately the whole spectrum of compound quality. More worryingly, widely used rules may inadvertently foster undesirable molecular property inflation as they permit the encroachment of rule-compliant compounds towards their boundaries. We propose a measure of drug-likeness

Improved Chemical Text Mining of Patents with Infinite Dictionaries and Automatic Spelling Correction

The text mining of patents of pharmaceutical interest poses a number of unique challenges not encountered in other fields of text mining. Unlike fields, such as bioinformatics, where the number of terms of interest is enumerable and essentially static, systematic chemical nomenclature can describe an infinite number of molecules. Hence, the dictionary- and ontology-based techniques that are commonly used for gene names, diseases, species, etc., have

Making every SAR point count: the development of Chemistry Connect for the large-scale integration of structure and bioactivity data

The increase in drug research output from patent applications, together with the expansion of public data collections, such as ChEMBL and PubChem BioAssay, has made it essential for pharmaceutical companies to integrate both internal and external ‘SAR estate’. The AstraZeneca response has been the development of an enterprise application, Chemistry Connect, containing 45 million unique chemical structures from 18 internal and external data sources.

Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds

Background Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert

Investigation of the Relationship between Topology and Selectivity for Druglike Molecules

There is a strong interest in drug discovery and development to advance the understanding of pharmacological promiscuity. Improved understanding of how a molecular structure is related to promiscuity could help to reduce the attrition of compounds in the drug discovery process. For this purpose, a descriptor is introduced that describes the structural complexity of a compound based on the size of its molecular framework (MF) in relation to its overall

Physicochemical property profiles of marketed drugs, clinical candidates and bioactive compounds

We performed a comparison of several simple physicochemical properties between marketed drugs, clinical candidates and bioactive compounds using commercially available databases (GVKBIO, Hyderabad, India). In contrast to previous studies this comparison was performed at the individual target level. Confirming earlier studies this shows that marketed drugs have, on average and taken as a single set, lower physicochemical property values than the corresponding

Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds

Background Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability

Novel Chemical Space Exploration via Natural Products

Natural products (NPs) are a rich source of novel compound classes and new drugs. In the present study we have used the chemical space navigation tool ChemGPS-NP to evaluate the chemical space occupancy by NPs and bioactive medicinal chemistry compounds from the database WOMBAT. The two sets differ notably in coverage of chemical space, and tangible leadlike NPs were found to cover regions of chemical space that lack representation in WOMBAT. Property

ChemGPS-NPWeb: chemical space navigation online

Internet has become a central source for information, tools, and services facilitating the work for medicinal chemists and drug discoverers worldwide. In this paper we introduce a web-based public tool, ChemGPS-NPWeb (, for comprehensive chemical space navigation and exploration in terms of global mapping onto a consistent, eight dimensional map over structure derived physico-chemical characteristics. ChemGPS-NPWeb can assist

