NDCG Score: Fine-Tuning Recommendation System Evaluation
Introduction: Why NDCG?
NDCG (Normalized Discounted Cumulative Gain) is a ranking-centric metric. Unlike accuracy-based approaches, it emphasizes the position of relevant items, crucial to understand user satisfaction in recommender systems.
NDCG scorealso explains how well the top-N recommendations match the actual user interactions.DCGexplains the gain of items based on positions.NDCGnormalises it.
From a ML lead point of view, a higher NDCG means users see more relevant items up front. It is consistent with other KPIs like engagement and conversion.
Basic Fundamentals: DCG & IDCG
- DCG: Summation of relevance scores, discounted by log of the position.
 
def dcg_at_k(relevance, k):
    """Compute DCG@K"""
    relevance = np.asarray(relevance)[:k]
    dcg = np.sum(relevance / np.log2(np.arange(2, relevance.size + 2)))
    return dcg
- 
    
IDCG: Ideal DCG—what your DCG would be if all relevant items were at the top.
 - 
    
NDCG: Normalizes results for easy comparison across different datasets.
 
Hello World Example: Simple NDCG Demo
How a small difference in ordering affects NDCG.
| Scenario | k | NDCG Score | 
|---|---|---|
| Bad Predictions | 5 | 0.6957 | 
| Better Predictions | 5 | 0.7182 | 
| Ideal Ranking | 5 | 1.0000 | 
| Truncated (k=4) | 4 | 0.3520 | 
| Tied Scores | 5 | 0.5000 | 
Handling Larger Systems: Implicit vs. Explicit Data
- 
    
Explicit: Ratings, likes, or feedback forms. NDCG evaluates how well you’re ranking items users explicitly rated highly.
🔗 Script for NDCG score on explicit dataset & algo comparison
surpriselibis used for trying different algorithm.- SVD++ performed better but took most time to run.
 - Technically, NDCG scores calculated for all algorithms suggest that recommendations are not great.
        
- NDCG score is an offline metric, it doesn’t replace any online metrics (KPIs) which needs to be calculated through an A/B test.
 
 
 
| Algorithm | NDCG@10 | Runtime (s) | 
|---|---|---|
| SVD | 0.0470 | 3.35 | 
| SVD++ | 0.0692 | 48.73 | 
| NMF | 0.0062 | 3.02 | 
| KNNBasic | 0.0004 | 28.34 | 
| KNNWithMeans | 0.0016 | 31.12 | 
| KNNWithZScore | 0.0014 | 33.73 | 
| KNNBaseline | 0.0010 | 37.54 | 
| SlopeOne | 0.0015 | 23.65 | 
| CoClustering | 0.0058 | 2.91 | 
- Implicit: Clicks, spend time (behavioural data), or purchases. NDCG helps interpret real-world engagement signals, highlighting the top actions you most want to rank first.
 
🔗 Script for NDCG score calculation on implicit dataset & comparison
Once you check / run the script preparing a linear rec-sys model on lastfm data available publicly (here).
The results are:
- NDCG@10 - All Products: 0.0025                                                  
- NDCG@10 - Top Played Products Only: 0.0022
This is not a good NDCG. A low NDCG score suggests weak recommendation quality. This is expected because of high sparsity and noise in implicit datasets. Also, the model we trained is a linear ALS model.
ALS: a matrix factorisation model which assumes structured rating behaviour. In an implicit dataset (clicks, plays, purchases, etc.), such structure is not usually captured.
An improvement could be to try:
- Weighted confidence-based ALS
 - Hybrid model
 - Noise filtering before trying ALS
 
Managerial Perspective: Leveraging NDCG Insights
- Strategy: Align recommended items with high-relevance or high-value categories.
 - Goal Tracking: A higher NDCG correlates with increased user satisfaction, retention, and ultimately revenue.
 - Dashboard Integration: Combine NDCG with business KPIs (sales, CTR) to see direct impact of your recommendation algorithm adjustments.
 
New Metrics: Hitrate NDCG & Beyond
- Hitrate NDCG: A blend of immediate user “hits” and the ranking-based viewpoint.
 - Other Variants: MRR (Mean Reciprocal Rank), MAP (Mean Average Precision), etc.
 - Why Expand?: Different business objectives—some might care about quick discovery (hitrate), others about deep engagement (DCG-based).
 
Conclusion
NDCG is not just a metric—it’s a direct KPI through which one can see
- ranking performance
 - user engagement
 
and can connect business outcomes.