SPACE (PUNCTUATION)
(Redirected from Space character)
In writing, a 'space' ( ) is any empty (non-written) zone between written sections. However, the term is usually used to refer to an empty zone used for interword separation (''interword space''). Conventions about the presence and size of interword spaces vary from language to language, and in some cases may be quite complex. Many different 'space characters' are available in computing character sets for representing spaces of different sizes and meaning.
Main articles: Interword separation
Modern English uses a standard space to separate words. Conventions vary with regard to spacing around punctuation, such as the full stop (period) (see full stop and French spacing), exclamation mark, question mark, and dash (see below).
Not all languages use spaces between words. Spaces were not used to separate words in Latin until roughly 600 AD – 800 AD. Ancient Hebrew and Arabic ''did'' use spaces, partly to compensate in clarity for the lack of vowels. Traditionally, all CJK languages have no spaces: modern Chinese and Japanese (except when written with little or no kanji) still do not, but modern Korean uses spaces.
In programming language syntax, spaces are frequently used to explicitly separate tokens. Aside from this use, spaces and other whitespace characters are usually ignored by modern programming languages. Exceptions are Haskell, ABC, and Python, which use the amount of whitespace in indentation to indicate the bounds of a block, and a whimsical language called Whitespace, where whitespace is the only meaningful syntactical element.
Text editors, word processors, and desktop publishing software differ in how they represent whitespace on the screen, and how they represent spaces at the ends of lines longer than the screen or column width. In some cases, spaces are shown simply as blank space; in other cases they may be represented by an interpunct or other symbols. Many different characters (described below) could be used to produce spaces, and non-character functions (such as margins and tab settings) can also affect whitespace.
In computer character encodings, there is a normal 'general-purpose space' (Unicode character ; 32 decimal) whose width will vary according to the design of the typeface. Typical values range from 1/5-em to 1/3-em (in digital typography an em is equal to the nominal size of the font, so for a 10-point font the space will probably be between 2 and 3.3 points). Sophisticated fonts will have differently sized spaces for bold, italic, and small-caps faces, and often compositors will manually adjust the width of the space depending on the size and prominence of the text.
In addition to this general-purpose space, it is possible to encode a space of a specific width. See the table below for a complete list.
(In monospaced proofreading copy, only em- and en-spaces are represented using this character (which is called an ''em-quad'' or an ''en-quad''), while other types of spaces are represented with a number sign.)
When rendered, the generic Unicode space is often considered insignificant when appearing at the end of a line of text, or when part of a sequence of whitespace characters, so it may be omitted or "collapsed" in such circumstances. The 'non-breaking space', (160 decimal), renders the same as a normal space but is expressly non-collapsible. It is often used to prevent line wrapping or to indent text, though some World Wide Web authorities discourage using it for those purposes.
Typically, an en dash is surrounded by two normal spaces, while an em dash is set continuous with the text. However, an em dash can optionally be surrounded with a so-called 'hair space', (8202 decimal). This space should be much thinner than a normal space, and is seldom used on its own. It can be written in HTML by using the numeric character reference   or  . Unfortunately, very few user agents are able to render a hair space correctly: in most cases the result is an unwanted symbol or a question mark on the screen, depending on the font and renderer capabilities.
Unicode defines several space characters with specific semantics and rendering characteristics, as shown in the table below. Depending on the browser and fonts used to view this table, not all spaces may display properly:
Unicode also provides some visible characters to stand in for space when necessary in the "Control Pictures" block: the Symbol For Space (U+2420), the Blank Symbol (U+2422), and the Open Box (U+2423).
Space characters appearing in inconsequential places within element start tags in both XML and HTML are generally ignored by processors of those markup languages. For example, spaces that appear on either side of the "
In writing, a 'space' ( ) is any empty (non-written) zone between written sections. However, the term is usually used to refer to an empty zone used for interword separation (''interword space''). Conventions about the presence and size of interword spaces vary from language to language, and in some cases may be quite complex. Many different 'space characters' are available in computing character sets for representing spaces of different sizes and meaning.
Use of the Space in Natural Languages
Main articles: Interword separation
Modern English uses a standard space to separate words. Conventions vary with regard to spacing around punctuation, such as the full stop (period) (see full stop and French spacing), exclamation mark, question mark, and dash (see below).
Not all languages use spaces between words. Spaces were not used to separate words in Latin until roughly 600 AD – 800 AD. Ancient Hebrew and Arabic ''did'' use spaces, partly to compensate in clarity for the lack of vowels. Traditionally, all CJK languages have no spaces: modern Chinese and Japanese (except when written with little or no kanji) still do not, but modern Korean uses spaces.
Use of the Space in Computing
In programming language syntax, spaces are frequently used to explicitly separate tokens. Aside from this use, spaces and other whitespace characters are usually ignored by modern programming languages. Exceptions are Haskell, ABC, and Python, which use the amount of whitespace in indentation to indicate the bounds of a block, and a whimsical language called Whitespace, where whitespace is the only meaningful syntactical element.
Text editors, word processors, and desktop publishing software differ in how they represent whitespace on the screen, and how they represent spaces at the ends of lines longer than the screen or column width. In some cases, spaces are shown simply as blank space; in other cases they may be represented by an interpunct or other symbols. Many different characters (described below) could be used to produce spaces, and non-character functions (such as margins and tab settings) can also affect whitespace.
Space characters and digital typography
The variable-width general-purpose space
In computer character encodings, there is a normal 'general-purpose space' (Unicode character ; 32 decimal) whose width will vary according to the design of the typeface. Typical values range from 1/5-em to 1/3-em (in digital typography an em is equal to the nominal size of the font, so for a 10-point font the space will probably be between 2 and 3.3 points). Sophisticated fonts will have differently sized spaces for bold, italic, and small-caps faces, and often compositors will manually adjust the width of the space depending on the size and prominence of the text.
In addition to this general-purpose space, it is possible to encode a space of a specific width. See the table below for a complete list.
(In monospaced proofreading copy, only em- and en-spaces are represented using this character (which is called an ''em-quad'' or an ''en-quad''), while other types of spaces are represented with a number sign.)
Breaking and non-breaking spaces
When rendered, the generic Unicode space is often considered insignificant when appearing at the end of a line of text, or when part of a sequence of whitespace characters, so it may be omitted or "collapsed" in such circumstances. The 'non-breaking space', (160 decimal), renders the same as a normal space but is expressly non-collapsible. It is often used to prevent line wrapping or to indent text, though some World Wide Web authorities discourage using it for those purposes.
Hair spaces around dashes
Typically, an en dash is surrounded by two normal spaces, while an em dash is set continuous with the text. However, an em dash can optionally be surrounded with a so-called 'hair space', (8202 decimal). This space should be much thinner than a normal space, and is seldom used on its own. It can be written in HTML by using the numeric character reference   or  . Unfortunately, very few user agents are able to render a hair space correctly: in most cases the result is an unwanted symbol or a question mark on the screen, depending on the font and renderer capabilities.
| Normal space | left right | left right |
|---|---|---|
| Normal space with em dash | left — right | left — right |
| Hair space with em dash | leftright | left — right |
| No space with em dash | left—right | left—right |
Table of Spaces
Unicode defines several space characters with specific semantics and rendering characteristics, as shown in the table below. Depending on the browser and fonts used to view this table, not all spaces may display properly:
| Code | No break | HTML entity | Name | In Block | Display | Description |
|---|---|---|---|---|---|---|
| U+0020 |   | Space | Basic Latin | Normal space, same as ASCII character 0x20 | ||
| U+00A0 | | No-Break Space | Latin-1 Supplement | Identical to U+0020, but not a point at which a line may be broken | ||
| U+1680 |   | Ogham Space Mark | Ogham | Used for interword separation in Ogham text. Normally a vertical line in vertical text or a horizontal line in horizontal text, but may also be a blank space in "stemless" fonts. Requires an Ogham font. | ||
| U+2002 |   | En Space, or Nut | General Punctuation | Width of one en (half of one em) | ||
| U+2003 |   | Em Space, or Mutton | General Punctuation | Width of one em | ||
| U+2004 |   | Three-Per-Em Space, or Thick Space | General Punctuation | One third of an em wide | ||
| U+2005 |   | Four-Per-Em Space, or Mid Space | General Punctuation | One fourth of an em wide | ||
| U+2006 |   | Six-Per-Em Space | General Punctuation | One sixth of an em wide. In computer typography sometimes equated to U+2009. | ||
| U+2007 |   | Figure Space | General Punctuation | In fonts with monospaced digits, equal to the width of one digit | ||
| U+2008 |   | Punctuation Space | General Punctuation | As wide as the narrow punctuation in a font | ||
| U+2009 |   | Thin Space | General Punctuation | One fifth (sometimes one sixth) of an em wide | ||
| U+200A |   | Hair Space | General Punctuation | Thinner than a thin space | ||
| U+200B | ​ | Zero-Width Space | General Punctuation | Used to indicate word boundaries to text processing systems when using scripts that do not use explicit spacing; normally not a visible separation, but it may expand in passages that are fully justified. In HTML pages this space can be used as a potential line-break in long words as a replacement for the deprecated | ||
| U+202F |   | Narrow No-Break Space | General Punctuation | Similar to U+00A0 No-Break Space | ||
| U+205F |   | Medium Mathematical Space | General Punctuation | Used in mathematical formulae | ||
| U+2060 | ⁠ | Word Joiner | General Punctuation | Identical to U+200B, but not a point at which a line may be broken. Introduced in Unicode 3.2 to replace the deprecated "zero width no-break space" function of the U+FEFF character. | ||
| U+3000 |   | Ideographic Space | CJK Symbols and Punctuation | As wide as a CJK character cell | ||
| U+FEFF |  | Zero Width No-Break Space = Byte Order Mark (BOM) | Arabic Presentation Forms-B | Used primarily as a Byte Order Mark character. Use as an indication of non-breaking is deprecated as of Unicode 3.2. See U+2060 instead. |
Unicode also provides some visible characters to stand in for space when necessary in the "Control Pictures" block: the Symbol For Space (U+2420), the Blank Symbol (U+2422), and the Open Box (U+2423).
Space characters in markup languages
Space characters appearing in inconsequential places within element start tags in both XML and HTML are generally ignored by processors of those markup languages. For example, spaces that appear on either side of the "
=" that separates an attribute name from its value have no effect on the interpretation of the document. Element end tags can contain trailing spaces, and empty-element tags in XML can contain spaces before the "/>".
In XML attribute values, sequences of whitespace characters are treated as a single space when the document is read by a parser[1]. Whitespace in XML element content is not changed in this way by the parser, but an application receiving information from the parser may choose to apply similar rules to element content. An XML document author can use the xml:space="preserve" attribute on an element to force the parser to discourage the downstream application from altering whitespace in that element's content.
In most HTML elements, a sequence of whitespace characters is treated as a single ''inter-word separator'', which may manifest as a single space character when rendering text in a language that normally inserts such space between words.[2] Renderers are required to apply a more literal treatment of whitespace in certain elements, such as pre and any element for which CSS has been used to apply pre-like whitespace processing. In such elements, space characters will not be "collapsed" into inter-word separators.
In both XML and HTML, the non-breaking space character is not treated as "whitespace", so it is not subject to the rules above.
See also
★ Hard space
★ Internal field separator
★ Non-breaking space
★ Hyphenation
External links
★ Unicode spaces, by Jukka "Yucca" Korpela.
★ Commonly confused characters
This article provided by Wikipedia. To edit the contents of this article, click here for original source.
psst.. try this: add to faves

العربية
中国
Français
Deutsch
Ελληνική
हिन्दी
Italiano
日本語
Português
Русский
Español