Comparison of two association rule mining algorithms without candidate generation
MetadataShow full item record
Association rule mining techniques play an important role in data mining research where the aim is to find interesting correlations among sets of items in databases. Although the Apriori algorithm of association rule mining is the one that boosted data mining research, it has a bottleneck in its candidate generation phase that requires multiple passes over the source data. FP-Growth and Matrix Apriori are two algorithms that overcome that bottleneck by keeping the frequent itemsets in compact data structures, eliminating the need of candidate generation. To our knowledge, there is no work to compare those two similar algorithms focusing on their performances in different phases of execution. In this study, we compare Matrix Apriori and FP-Growth algorithms. Two case studies analyzing the algorithms are carried out phase by phase using two synthetic datasets generated in order i) to see their performance with datasets having different characteristics, ii) to understand the causes of performance differences in different phases. Our findings are i) performances of algorithms are related to the characteristics of the given dataset and threshold value, ii) Matrix Apriori outperforms FP-Growth in total performance for threshold values below 10%, iii) although building matrix data structure has higher cost, finding itemsets is faster.