It also kind of Market Basket Analysis.
Introduction
An Association Rule is where ($X$ implies $Y$)
An item set is a set of items. If it has $k$ items, it is a $k$-itemset.
Support $s$ of an itemset $X$ is the percentage of transactions in $D$ that contains $X$.
Support of association rule $X \Longrightarrow Y$ is the support of the itemset $\{X,Y\}$.
Confidence of the rule $X \Longrightarrow Y$ is the ratio between the transactions that contain both $X$ and $Y$ and the number of transactions that have $X$ in $D$.
The problem is: Find association rules.
Given:
- A set $I$ of items
- database $D$ of transactions
- minimum support $s$
- minimum confidence $c$
Find: Association rules $X \Longrightarrow Y$ with a minimum support $s$ and minimum confidence $c$
Apriori Algorithm
There are two pinciples of the apriori algorithm:
- Any subset of a frequent itemset is also frequent
- Any superset of an infrequent itemset is also infrequent
(example see the reference 2)
Improvements
The limitation of confidence:
If $Y$ is independent of $X$: .
This means if you have a high probability of $p(Y)$ we have a rule with high confidence that associate independent itemset.
Lift: measure indicates departure from independence of $X$ and $Y$.
The lift of $X \Longrightarrow Y$ is:
But lift is symmetric, the same for $X \Longrightarrow Y$ as $Y \Longrightarrow X$
Conviction: indicates that $X$ and $Y$ are not independent, and takes into account the direction of implication.
Since the $p \rightarrow q \equiv \neg p \vee q$, and we can rewrite it as the $\neg( p \wedge \neg q)$. Therefore, the Conviction is based on this
Conviction is a measure of the implication and has value 1 if items are unrelated.
Reference
- Jo slide Market Basket
- Mining Association Rules