SIMPLIFIED MOLECULAR INPUT LINE ENTRY SPECIFICATION


The 'simplified molecular input line entry specification' or 'SMILES' is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.
The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc). Recently, the IUPAC has introduced the InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing.

Contents
''Canonical SMILES'' and ''Isomeric SMILES''
Graph-based definition
Examples
Isomeric SMILES
Extensions
Conversion
See also
References
External links
Specifications
SMILES related software utilities

''Canonical SMILES'' and ''Isomeric SMILES''


The term 'Canonical SMILES' refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation. A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database.
The term 'Isomeric SMILES' refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds. A notable feature of these rules is that they allow rigorous partial specification of chirality.

Graph-based definition


In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.

Examples


Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. The hydroxide anion is [OH-]. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O and that for ethanol is CCO.
The double-bonded carbon dioxide is represented as O=C=O and the triple-bonded hydrogen cyanide as C#N.
Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for fluoroform, which could also be described by the non-canonical formula FC(F)F.
Cyclohexane is represented as C1CCCCC1, the idea being that the two 'number ones' label the same position in the molecule, thus forming a ring with six carbons. Note that the label is the numeral (in this case the 1) rather than the combination of 'C1'.
Aromatic C, O, S and N atoms are shown in their lower case 'c', 'o', 's' and 'n' respectively. Bonds in an aromatic cycle are rarely marked explicitly except in SMARTS search patterns. Thus Benzene is c1ccccc1.
Isomeric SMILES

Representation of cis-difluoroethene

Configuration around double bonds is specified using the characters "/" and "". For example, F/C=C/F is one representation of ''trans''-difluoroethene, in which the Fs are on opposite sides of the double bond, whereas F/C=CF is one possible representation of ''cis''-difluoroethene, in which the Fs are on the same side of the double bond, as shown in the figure.

Extensions


'SMARTS' is a modification of SMILES that allows, in addition to the SMILES elements, the specification of wildcard atoms and bonds. This is used in specifying search structures and is widely used in chemical database search applications. This practice has led to a common misconception that chemical substructure search is achieved computationally by matching SMILES/SMARTS strings, when, in fact, it is achieved by the computationally more intensive search for subgraph isomorphism in the graphs reconstructed from the SMILES representations.

Conversion


SMILES can be converted back to 2-dimensional representations using Structure Diagram Generation algorithms (Helson, 1999). This conversion is not always unambiguous. Conversion to 3-dimensional representation is achieved by energy minimization approaches. There are many downloadable and web-based conversion utilities.

See also



SYBYL Line Notation (another line notation)

Molecular Query Language - query language allowing also numerical properties, e.g. physicochemical values or distances

Chemistry Development Kit (2D layout and conversion)

International Chemical Identifier (InChI), the free and open alternative to SMILES by the IUPAC.

OpenBabel, JOELib, OELib (conversion)

References



★ Anderson, E., G.D. Veith, and D. Weininger. 1987. SMILES: A line notation and computerized interpreter for chemical structures. Report No. EPA/600/M-87/021. U.S. EPA, Environmental Research Laboratory-Duluth, Duluth, MN 55804

★ Weininger, D. (1988), 'SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules', J. Chem. Inf. Comput. Sci. 28, 31 - 36.

★ Helson, HE (1999). Structure Diagram Generation In Rev. Comput. Chem. edited by Lipkowitz, K. B. and Boyd, D. B. Wiley-VCH, New York, pages 313-398.

External links


Specifications


"SMILES - A Simplified Chemical Language"

"SMARTS - SMILES Extension"

Daylight SMILES tutorial

Parsing SMILES
SMILES related software utilities


smi23d - 3D Coordinate Generation

Daylight Depict

CACTVS at NCI

PubChem online molecule editor

JME molecule editor

ACD/ChemSketch

CSMILES aware Java based molecule editor and 2D/3D viewer

Smormo-Ed:A Molecule editor for Linux which can read and write SMILES

InChI.info: an unofficial InChI website featuring on-line converter from InChI and SMILES to molecular drawings

This article provided by Wikipedia. To edit the contents of this article, click here for original source.

psst.. try this: add to faves