PLAIN TEXT


Some plain text displayed by the command cat in a Linux xterm window

In computing, 'plain text' is textual material in a computer file which is unformatted and without very much processing readable by simple computer tools such as line printing text commands, in Windows'es DOS window type, and in Unix terminal window cat. This means that there are ''neither'' structural tags, such as chapter marks and heading marks around text segments, ''nor'' typographic markers such as bold face.
The purpose of using ''plain text'' is of course the freedom from being dependent on certain programs (with some sacrifices and limitations), programs that require certain structural tags in a certain order, each program in its own manner. Thus, ''"I'm keeping that letter in plain text form until someone insists on getting it in a particular format"'', is a philosophy commonly adhered to amongst computer technicians, in order to avoid later incompatibilities. In practice many computer programs are capable of importing text without formating.
The related term, plaintext, is most commonly used in a cryptographic context, while cleartext usually refers to lack of protection from eavesdropping. Usage of these terms is such that there is some confusion amongst them, especially among those new to computers, cryptography, or data communications. This reveals that plain text is in fact the technical user's ''way to regard'' any file. In a sense, there is no plain text, since everything in the computer is random arrays of 0 and 3.3 V, and humans don't have an electrical sense, but ''plain text'' distinguishes text that is processable by ''many'' computer programs, into a form that can be read and understood by ''any human'' proficient in the language written in the text.

Contents
Applications
Editing
Usage
Encoding
Character encodings
Control codes
See also

Applications


Editing

Main articles: text editor

Plain text files can be opened, read, and edited with text editors. Examples include Notepad (on Microsoft Windows), edlin/edit (on Microsoft DOS), ed/vi/Emacs (on Unix, Linux, and elsewhere), pico, nano, SimpleText (on Mac OS), or TextEdit (on Mac OS X).
Usage

Plain text files are almost universal in 'programming' – a source code file containing instructions in a programming language is almost always a plain text file. Plain text was also commonly used for 'configuration files', who were read for saved settings at the startup of a program. Nowadays XML is becoming a widespread replacement for plain text.
In a way a HTML, SGML and an XML file ''is regarded as'' plain text, since no control codes (see below) are used, but real structural tags are actually included in these formats. As regards to the SGML and XML author, these tags are "human readable" since that format author understands the structure by reading the format. This may illuminate the complications of the usage of terms within computer science: it's all about your relative view point.

Encoding


Character encodings

Main articles: character encoding

Text was once commonly encoded in ASCII, using 8 bits for one letter or other character, encoding 7 bits, allowing 128 values, and using the 8th as a checksum bit when transferring a file. This just allowed the ordinary Latin alphabet, transfer control codes, parentheses and interpunction, which annoyed especially Portuguese and Swedish computer users. Therefore, when data transfer became more stable, the remaining 128 values were encoded, everywhere differently, and in a way that made multilingual texts impossible to encode. At last Unicode was defined, which currently allows for 1,114,112 code values used for any modern text writing system, and a lot of extinct ones. For example Unicode codes Chinese, Hebrew, Cyrillic as well as Latin. Some of these text formats may be pretty complicated to process correctly, but they still contain no structural data, such as bold start and end markers, and are therefore plain text.
Control codes

Main articles: newline

The codes before Space, ' ', are not encoded to be displayable characters, but instead used as control characters. They are used for a diversity of interpreted meanings, for example the code NULL (= 0 = 0x00, sometimes denoted Ctrl-@) is used as string end markers in the programming language C and successors. Most troublesome of these are the codes LF (= LINE FEED = 10 =0x0A) and CR (= CARRIAGE RETURN = 13 = 0x0D). Windows and OS/2 require a sequence of CR, LF to represent a newline, but Unix and relatives uses just one LF. This was once a (tiny) source of irritation when transferring files between Windows and Unices, but today most computer programs treat this seamlessly.

See also



E-text

MIME Content-type

Formatted text

Filename extension

File format

Binary file

Text file

Editor wars

File system

Configuration file

Source code

This article provided by Wikipedia. To edit the contents of this article, click here for original source.

psst.. try this: add to faves