ASSOCIATION RULE LEARNING

In data mining and treatment learning, 'association rule learners' are used to discover elements that co-occur frequently within a data set [1] consisting of multiple independent selections of elements (such as purchasing transactions), and to discover rules, such as implication or correlation, which relate co-occurring elements. Questions such as "if a customer purchases product A, how likely is he to purchase product B?" and "What products will a customer buy if he buys products C and D?" are answered by association-finding algorithms. This application of association rule learners is also known as market basket analysis. As with most data mining techniques, the task is to reduce a potentially huge amount of information to a small, understandable set of statistically supported statements.
In text books as well as in the business literature, market basket analysis is often promoted as a means to obtain product associations to base a retailer’s promotion strategy on. They argue that associated products with a high lift/interest can be promoted effectively by only discounting just one of the two products. Implicitly, they argue that market basket analysis automatically identifies complements. Academics [2], however, have shown that one should be careful with this conclusion. They show that this implicit assumption does not hold. Their empirical analysis reveals that market basket analysis identifies as many substitutes as complements. Therefore, market basket analysis should not be used to build a promotion expert system for retailers, unless supplemented by other, more empirical, methods of product relationship determination.

Contents
Technical Discussion
Fixed-Confidence Mining
Lore
Other types of Association Mining
See also
External links

Technical Discussion


The input for a typical associations-mining algorithm is a set T of ''itemsets'' t, each of which is drawn from a set I of all possible items. Each t is a member of the power set 2^I, but T is not considered a subset of 2^I since it may contain duplicates (it is a multiset). Since I is typically large, the general problem of finding all common subsets in an arbitrary selection of itemsets is considered intractable. Therefore input sets in T, and any results derived therefrom, are typically assumed to be small. It is an ongoing area of research to find algorithms which relax this assumption and allow processing of larger sets.

Fixed-Confidence Mining


Commonly, association-finding algorithms attempt to find all sets of elements which occur in at least a fraction C of the data, where C is a chosen ''Confidence Threshold'' (e.g. 2%). The number of occurrences of a subset is called its ''support''. Sets whose support exceeds C are called ''frequent itemsets''.
If a set s is frequent, then any subset of s must also be, and most association-finding algorithms attempt to exploit this fact. Most association-finding algorithms reduce to a traversal of this subset lattice of I in some order, extending frequent itemsets and pruning out infrequent sets and their supersets.
The fixed confidence threshold has little basis in statistics, since some sets may exceed it simply by random coincidence (thereby defeating the goal of finding meaningful correlations), and some meaningful associations may occur in the data without reaching the threshold. However, in practice it does eliminate vast numbers of insignificant sets.
For a given data set, the set of its frequent itemsets can be described by its ''maximal'' frequent itemsets, which are frequent itemsets S that are not subsets of any larger frequent itemset T. During mining, finding maximal frequent itemsets first allows their subsets to be skipped, an important improvement if sets are large.

Lore


A famous story about association rule mining is the "beer and diaper" story. A purported survey of behavior of supermarket shoppers discovered that customers (presumably young men) who buy diapers tend also to buy beer. This anecdote became popular as an example of how unexpected association rules might be found from everyday data.

Other types of Association Mining


'Contrast set learning' is a form of associative learning. 'Contrast set learners' use rules that differ meaningfully in their distribution across subsets [1].
'Weighted class learning' is another form of associative learning in which weight may be assigned to classes to give focus to a particular issue of concern for the consumer of the data mining results.
'K-optimal pattern discovery' provides an alternative to the standard approach to association rule learning that requires that each pattern appear frequently in the data.

See also



Apriori algorithm

External links



★ [1] T. Menzies, Y. Hu, "Data Mining For Busy People." ''IEEE Computer'', October 2003, pgs. 18-25.

★ [2] B. Vindevogel, Dirk Van den Poel, and Geert Wets, "Why promotion strategies based on market basket analysis do not work?" ''Expert Systems with Applications'', 28 (3), 2005, pgs. 583-590.

This article provided by Wikipedia. To edit the contents of this article, click here for original source.

psst.. try this: add to faves