Skip to content

Research Topic

Research Topic: Improved Apriori Algorithm

Section titled “Research Topic: Improved Apriori Algorithm”

The Apriori algorithm is a classic algorithm for mining frequent itemsets in transactional databases. It uses a bottom-up approach, generating candidate itemsets and then checking their frequency against the database. The FP-Growth algorithm, introduced by Han et al. in 2000, offers an alternative approach that avoids candidate generation by using a tree-based data structure.

Traditional Apriori algorithms face several challenges:

  • Multiple database scans: Requires scanning the database multiple times
  • Large candidate sets: Generates many candidate itemsets that may not be frequent
  • Memory overhead: Stores all candidate itemsets in memory
  • Scalability issues: Performance degrades with large datasets

While FP-Growth addresses many of these issues, understanding both algorithms and their trade-offs is crucial for:

  • Selecting the appropriate algorithm for different scenarios
  • Developing improved variants
  • Understanding the theoretical foundations of frequent itemset mining

This project focuses on improving the Apriori algorithm by:

  1. Reducing the number of database scans
  2. Optimizing candidate generation
  3. Improving memory efficiency
  4. Enhancing overall performance
  5. Comparing performance with FP-Growth algorithm

The research will involve:

  • Literature review of existing improvements to Apriori
  • Analysis of FP-Growth algorithm and its advantages
  • Algorithm design and analysis for improved Apriori
  • Implementation of both Apriori (improved) and FP-Growth algorithms
  • Experimental evaluation on benchmark datasets
  • Performance comparison between improved Apriori and FP-Growth
  • Analysis of trade-offs and use cases for each algorithm
  • Novel improvements to the Apriori algorithm - Implemented Weighted Apriori with intersection-based counting
  • Comprehensive performance analysis - Runtime tracking with detailed metrics
  • Open-source implementation - Complete implementations of all three algorithms
  • Detailed documentation - Comprehensive documentation of algorithms and usage
  • Performance comparison - Comparison between improved Apriori and FP-Growth (in progress)
  • Guidelines for algorithm selection - Based on dataset characteristics (in progress)