Skip to content

Experiments

The experiments use the Amazon Reviews 2023 dataset from Hugging Face, specifically the Appliances category:

  • Total records: 2,128,605 reviews
  • Verified purchases: Filtered for verified purchases only
  • Transactions: 8,655 transactions (after preprocessing)
  • Preprocessing: Parent ASIN used for grouping, min_transaction_size=2, infrequent items filtered
  • Baseline 1: Traditional Apriori algorithm
  • Baseline 2: FP-Growth algorithm (standard implementation)
  • Test: Improved Apriori algorithm (Weighted Apriori with intersection-based counting)
  • Metrics: Execution time (microsecond precision), scalability across support thresholds, correctness verification
  • Minimum Support Threshold: 0.0005 (0.05%) for main comparison, varies (0.05% to 5.0%) for scalability analysis
  • Minimum Confidence: 0.5 (50%)
  • Preprocessing: Parent ASIN grouping, min_transaction_size=2, min_item_frequency=3
  • Runtime Measurement: Using time.perf_counter() for microsecond-level precision
  1. Data Preparation:

    • Load dataset from Hugging Face
    • Filter verified purchases
    • Create transactions using Parent ASIN
    • Apply preprocessing filters (transaction size, item frequency)
    • See Data Preprocessing for detailed preprocessing pipeline
  2. Data Exploration:

    • Analyze dataset characteristics
    • Determine optimal preprocessing parameters
    • Visualize transaction patterns and item frequencies
    • See Exploratory Data Analysis for detailed EDA methodology
  3. Baseline Execution:

    • Run traditional Apriori algorithm with runtime tracking
    • Run FP-Growth algorithm with runtime tracking
  4. Improved Execution:

    • Run Improved Apriori algorithm with detailed runtime breakdown
    • Verify correctness by comparing results with traditional Apriori
  5. Performance Measurement:

    • Record execution time for all phases (using microsecond precision)
    • Measure scalability across different support thresholds
    • Track memory usage patterns
  6. Result Analysis:

    • Compare performance across all three algorithms
    • Verify correctness through itemset matching
    • Analyze scalability characteristics
  7. Visualization:

    • Generate comprehensive visualizations for each algorithm
    • Create comparison visualizations
    • Generate runtime vs support threshold analysis

See the Results page for detailed experimental results and comparative analysis.