KD-TREE
In computer science, a '''k''d-tree' (short for ''k-dimensional tree'') is a space-partitioning data structure for organizing points in a ''k''-dimensional space. ''k''d-trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbour searches). ''k''d-trees are a special case of BSP trees.
A ''k''d-tree uses only splitting planes that are perpendicular to one of the coordinate system axes. This differs from BSP trees, in which arbitrary splitting planes can be used. In addition, in the typical definition every node of a ''k''d-tree, from the root to the leaves, stores a point.[1] This differs from BSP trees, in which leaves are typically the only nodes that contain points (or other geometric primitives). As a consequence, each splitting plane must go through one of the points in the ''k''d-tree. ''k''d-tries are a variant that store data only in leaf nodes. It is worth noting that in an alternative definition of ''k''d-tree the points are stored in its leaf nodes only, although each splitting plane still goes through one of the points.[2]
Nomenclature
Technically, the letter ''k'' refers to the number of dimensions. A 3-dimensional ''k''d-tree would be called a 3d-tree. However, the phrase "3-dimensional ''k''d-tree" is more commonly used. (It is also more descriptive, since a 3-dimensional tree could be any of a variety of different things, but the term ''k''d-tree refers to a specific type of space-partitioning tree.) The letters ''k'' and d are both lowercase, even when the term comes at the beginning of a sentence, and the ''k'' is in italics. However, variant spellings are common, such as KD-tree and Kd-tree.
2D and 3D are examples of uppercase D meaning "dimensional".
Operations on ''k''d-trees
Constructing a ''k''d-tree
Since there are many possible ways to choose axis-aligned splitting planes, there are many different ways to construct ''k''d-trees. The canonical method of ''k''d-tree construction has the following constraints:
★ As one moves down the tree, one cycles through the axes used to select the splitting planes. (For example, the root would have an ''x''-aligned plane, the root's children would both have ''y''-aligned planes, the root's grandchildren would all have ''z''-aligned planes, and so on.)
★ At each step, the point selected to create the splitting plane is the median of the points being put into the ''k''d-tree, with respect to their coordinates in the axis being used. (Note the assumption that we feed the entire set of points into the algorithm up-front.)
This method leads to a balanced ''k''d-tree, in which each leaf node is about the same distance from the root. However, balanced trees are not necessarily optimal for all applications.
Note also that it is not ''required'' to select the median point. In that case, the result is simply that there is no guarantee that the tree will be balanced.
Given a list of ''n'' points, the following algorithm will construct a balanced ''k''d-tree containing those points.
'function' kdtree (''list of points'' pointList, ''int'' depth)
{
'if' pointList 'is empty'
'return' 'nil';
'else'
{
''// Select axis based on depth so that axis cycles through all valid values''
'var' ''int'' axis := depth 'mod' k;
''// Sort point list and choose median as pivot element''
'select' median 'from' pointList;
''// Create node and construct subtrees''
'var' ''tree_node'' node;
node.location := median;
node.leftChild := kdtree(points 'in' pointList 'before' median, depth+1);
node.rightChild := kdtree(points 'in' pointList 'after' median, depth+1);
'return' node;
}
}
It is common that points "after" the median include only ones that are greater than or equal to the median. Another approach is to define a "superkey" function that compares the points in other dimensions. Lastly, it may be acceptable to let points equal to the median lie on either side.
This algorithm implemented in the Python programming language is as follows:
'class' Node:
'pass'
'def' kdtree(pointList, depth=0):
'if' 'not' pointList:
'return' None
''# Select axis based on depth so that axis cycles through all valid values''
k = len(pointList[0]) # Assumes all points have the same dimension
axis = depth % k
''# Sort point list to select median''
pointList.sort(key='lambda' x: x[axis])
median = len(pointList) // 2 ''# Choose median''
''# Create node and construct subtrees''
node = Node()
node.location = pointList[median]
node.leftChild = kdtree(pointList[:median], depth+1)
node.rightChild = kdtree(pointList[median+1:], depth+1)
'return' node
Example usage would be:
pointList = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]
tree = kdtree(pointList)
The tree generated is shown on the right.
This algorithm creates the invariant that for any node, all the nodes in the left subtree are on one side of a splitting plane, and all the nodes in the right subtree are on the other side. Points that lie on the splitting plane may appear on either side. The splitting plane of a node goes through the point associated with that node (referred to in the code as ''node.location'').
Adding elements to a ''k''d-tree
One adds a new point to a ''k''d-tree in the same way as one adds an element to any other tree. First, traverse the tree, starting from the root and moving to either the left or the right child depending on whether the point to be inserted is on the "left" or "right" side of the splitting plane. Once you get to a leaf node, add the new point as either the left or right child of the leaf node, again depending on which side of the node's splitting plane contains the new point.
Removing elements from a ''k''d-tree
Remove a point from an existing ''k''d-tree, without breaking the invariant.
Balancing a ''k''d-tree
Balancing a ''k''d-tree requires care. Because ''k''d-trees are sorted in multiple dimensions, the tree rotation technique cannot be used to balance them — this may break the invariant.
Nearest neighbor in a ''k''d-tree
The nearest neighbor (NN) algorithm, to find the NN to a given target point not in the tree, relies on the ability to discard large portions of the tree by performing a simple test. To perform the NN calculation, the tree is searched in a depth-first fashion, refining the nearest distance. First the root node is examined with an initial assumption that the smallest distance to the next point is infinite. The subdomain (right or left), which is a hyperrectangle, containing the target point is searched. This is done recursively until a final minimum region containing the node is found. The algorithm then (through recursion) examines each parent node, seeing if it is possible for the other domain to contain a point that is closer. This is performed by testing for the possibility of intersection between the hyperrectangle and the hypersphere (formed by target node and current minimum radius). If the rectangle that has not been recursively examined yet does not intersect this sphere, then there is no way that the rectangle can contain a point that is a better nearest neighbour. This is repeated until all domains are either searched or discarded, thus leaving the nearest neighbour as the final result. In addition to this one also has the distance to the nearest neighbour on hand as well. Finding the nearest point is an O(logN) operation.
The algorithm can be extended in several ways by simple modifications.
Approximate nearest neighbour can be achieved by simply setting an upper bound on the number points to examine in the tree, or by interrupting the search process based upon a real time clock (which may be more appropriate in hardware implementations). Nearest neighbour for points that are in the tree already can be achieved by not updating the refinement for nodes that give zero distance as the result.
Approximate nearest neighbor is useful in real time applications such as robotics due to the significant speed increased gained by not searching for the best point exhaustively.
''k''d-tree Applications
Orthogonal range search in a ''k''d-tree
Use a ''k''d-tree to find all the points that lie within a given rectangle (or a hyperrectangle). This operation is also called an orthogonal range search.
Determining where to evaluate a surface
In local regression, it is common to evaluate the fitted surface directly only at the vertices of a ''k''d-tree and to interpolate elsewhere. This use, which is pictured in the image above, is to ensure that only as many direct evaluations are performed as are necessary. Since the ''k''d-tree "adapts" itself to the space of predictor variables, this method can provide an excellent approximation to the true loess surface. If the approximation is poor, it can be improved by further subdivision of the tree's cel
Complexity
★ Building a static ''k''d-tree from ''n'' points takes O(''n'' log ''n'') time.
★ Inserting a new point into a balanced ''k''d-tree takes O(log ''n'') time.
★ Removing a point from a balanced ''k''d-tree takes O(log ''n'') time.
★ Querying an axis-parallel range in a balanced ''k''d-tree takes O(''n''1-1/d +''k'') time, where k is the number of the reported points, and d the dimension of the ''k''d-tree.
See also
★ implicit ''k''d-tree
★ min/max ''k''d-tree
★ Octree
★ Bounding Interval Hierarchy
★ Klee's measure problem
References
★ Bentley, J. L. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (Sep. 1975), 509–517.
★ Bentley, J. L. 1990. K-d Trees for Semidynamic Point Sets. SCG '90: Proc. 6th Annual Symposium on Computational Geometry (1990), 187–197
★ H. Samet, ''The Design and Analysis of Spatial Data Structures'', Addison-Wesley, Reading, MA, 1990.
This article provided by Wikipedia. To edit the contents of this article, click here for original source.
psst.. try this: add to faves

العربية
中国
Français
Deutsch
Ελληνική
हिन्दी
Italiano
日本語
Português
Русский
Español




