Association Rules Mining

It also kind of Market Basket Analysis.

Introduction

An Association Rule is where ($X$ implies $Y$)

An item set is a set of items. If it has $k$ items, it is a $k$-itemset.

Support $s$ of an itemset $X$ is the percentage of transactions in $D$ that contains $X$.

Support of association rule $X \Longrightarrow Y$ is the support of the itemset $\{X,Y\}$.

Confidence of the rule $X \Longrightarrow Y$ is the ratio between the transactions that contain both $X$ and $Y$ and the number of transactions that have $X$ in $D$.


The problem is: Find association rules.

Given:

  • A set $I$ of items
  • database $D$ of transactions
  • minimum support $s$
  • minimum confidence $c$

Find: Association rules $X \Longrightarrow Y$ with a minimum support $s$ and minimum confidence $c$


Apriori Algorithm

There are two pinciples of the apriori algorithm:

  • Any subset of a frequent itemset is also frequent
  • Any superset of an infrequent itemset is also infrequent

(example see the reference 2)

Improvements

The limitation of confidence:

If $Y$ is independent of $X$: .

This means if you have a high probability of $p(Y)$ we have a rule with high confidence that associate independent itemset.

Lift: measure indicates departure from independence of $X$ and $Y$.

The lift of $X \Longrightarrow Y$ is:

But lift is symmetric, the same for $X \Longrightarrow Y$ as $Y \Longrightarrow X$

Conviction: indicates that $X$ and $Y$ are not independent, and takes into account the direction of implication.

Since the $p \rightarrow q \equiv \neg p \vee q$, and we can rewrite it as the $\neg( p \wedge \neg q)$. Therefore, the Conviction is based on this

Conviction is a measure of the implication and has value 1 if items are unrelated.

Reference

  1. Jo slide Market Basket
  2. Mining Association Rules