TEXT NORMALIZATION

'Text normalization' is a process by which text is transformed in some way to make it consistent in a way which it may not have been before. Text normalization is often performed before a text is processed in some way, such as generating synthesized speech, automated language translation, storage in a database, or comparison.
Examples of text normalization:

Unicode normalization

★ converting all letters to lower or upper case

★ removing punctuation

★ removing letters with accent marks and other diacritics

★ expanding abbreviations
While this may be done manually, and usually is in the case of ad hoc and personal documents, many programming languages support mechanisms which enable text normalization.

This article provided by Wikipedia. To edit the contents of this article, click here for original source.

psst.. try this: add to faves