Experiments
Experimental Setup
Section titled “Experimental Setup”Datasets
Section titled “Datasets”The experiments use the Amazon Reviews 2023 dataset from Hugging Face, specifically the Appliances category:
- Total records: 2,128,605 reviews
- Verified purchases: Filtered for verified purchases only
- Transactions: 8,655 transactions (after preprocessing)
- Preprocessing: Parent ASIN used for grouping, min_transaction_size=2, infrequent items filtered
Experimental Design
Section titled “Experimental Design”- Baseline 1: Traditional Apriori algorithm
- Baseline 2: FP-Growth algorithm (standard implementation)
- Test: Improved Apriori algorithm (Weighted Apriori with intersection-based counting)
- Metrics: Execution time (microsecond precision), scalability across support thresholds, correctness verification
Experimental Parameters
Section titled “Experimental Parameters”- Minimum Support Threshold: 0.0005 (0.05%) for main comparison, varies (0.05% to 5.0%) for scalability analysis
- Minimum Confidence: 0.5 (50%)
- Preprocessing: Parent ASIN grouping, min_transaction_size=2, min_item_frequency=3
- Runtime Measurement: Using
time.perf_counter()for microsecond-level precision
Methodology
Section titled “Methodology”-
Data Preparation:
- Load dataset from Hugging Face
- Filter verified purchases
- Create transactions using Parent ASIN
- Apply preprocessing filters (transaction size, item frequency)
- See Data Preprocessing for detailed preprocessing pipeline
-
Data Exploration:
- Analyze dataset characteristics
- Determine optimal preprocessing parameters
- Visualize transaction patterns and item frequencies
- See Exploratory Data Analysis for detailed EDA methodology
-
Baseline Execution:
- Run traditional Apriori algorithm with runtime tracking
- Run FP-Growth algorithm with runtime tracking
-
Improved Execution:
- Run Improved Apriori algorithm with detailed runtime breakdown
- Verify correctness by comparing results with traditional Apriori
-
Performance Measurement:
- Record execution time for all phases (using microsecond precision)
- Measure scalability across different support thresholds
- Track memory usage patterns
-
Result Analysis:
- Compare performance across all three algorithms
- Verify correctness through itemset matching
- Analyze scalability characteristics
-
Visualization:
- Generate comprehensive visualizations for each algorithm
- Create comparison visualizations
- Generate runtime vs support threshold analysis
Results Summary
Section titled “Results Summary”See the Results page for detailed experimental results and comparative analysis.