Practical Perspectives of Consumer Fairness in Recommendation

Ludovico Boratto, Gianni Fenu, Mirko Marras, Giacomo Medda
Information Processing and Management (IPM 2023)

Abstract

In recent years, there has been an increasing number of mitigation procedures against consumer unfairness in personalized rankings. However, the experimental protocols adopted so far for evaluating a mitigation procedure were often fundamentally different (e.g., with respect to the fairness definitions, data sets, data splits, and evaluation metrics) and limited to a narrow set of perspectives (e.g., focusing on a single demographic attribute and/or not reporting any analysis on efficiency). This situation makes it challenging for scientists to consciously decide which mitigation procedure better suits their practical setting. In this paper, we investigated the properties a given mitigation procedure against consumer unfairness should be evaluated on, to provide a more holistic view on its effectiveness. We first identified eight technical properties and evaluated the extent to which existing mitigation procedures against consumer unfairness met these properties, qualitatively and quantitatively (when possible), on two public data sets. Then, we outlined the main trends and open issues emerged from our multi-dimensional analysis and provided key practical recommendations for future research.

Motivation

Recommender systems have been shown to lead to discriminatory outcomes affecting consumers. While numerous mitigation procedures have been proposed, the experimental protocols used to evaluate them differ fundamentally — varying in fairness definitions, datasets, data splits, and evaluation metrics. This makes it challenging for practitioners to select the most suitable mitigation procedure for their specific setting.

Eight Technical Properties

We propose a comprehensive evaluation framework based on eight key properties that any mitigation procedure should be assessed on:

  • Applicability: The range of recommendation models the mitigation can be applied to
  • Coherence: Whether the mitigation reduces unfairness without reversing disparities toward other groups
  • Consistency: The stability of category representation between interactions and recommendations
  • Data Robustness: How the mitigation handles data imbalances and popularity biases
  • Reproducibility: Whether the results can be replicated with available code and documentation
  • Scalability: Computational efficiency across different dataset sizes
  • Trade-off: The balance between recommendation utility and fairness improvement
  • Transferability: Performance consistency across different demographic attributes and datasets

Experimental Setup

We evaluated existing mitigation procedures on two public datasets with consumer sensitive attributes:

  • ML1M: 6,040 users, 3,952 items, ~1M ratings (Gender: 71.7% M / 28.3% F; Age: 56.6% <35 / 43.4% ≥35)
  • LFM1K: 268 users, 51,609 items, ~200K ratings (Gender: 57.8% M / 42.2% F; Age: 57.8% <25 / 42.2% ≥25)
Category Equity Score distribution
Figure 1: [Consistency] Category equity score (CES) distribution across item categories before (Orig) and after (Mit) applying Burke et al.'s mitigation. The closer to 1, the more similar the category representation between interactions and recommendations.
Data Robustness analysis
Figure 2: [Data Robustness] User interaction, recommendation, and relevant recommendations drift across item groups formed based on their popularity. Each tick represents a group of 1,000 items with similar popularity.
Trade-off analysis
Figure 3: [Trade-off] Gain/loss in recommendation utility (NDCG), equity (Demographic Parity), and independence (Kolmogorov-Smirnov) after applying mitigations. Positive NDCG percentages indicate utility gain; negative DP/KS percentages indicate fairness improvement.

Main Findings

  • Pre-processing approaches show highest applicability: Data transformation techniques can be applied regardless of the recommendation model
  • Trade-offs are unavoidable but manageable: Most mitigations show some utility loss, but several achieve favorable trade-offs
  • Transferability remains challenging: Mitigations effective for one demographic attribute (e.g., gender) may not transfer to others (e.g., age)
  • Reproducibility is a concern: Many procedures lack publicly available code or sufficient documentation
  • Data robustness varies significantly: Some mitigations amplify popularity biases while reducing demographic unfairness

Practical Recommendations

Based on our multi-dimensional analysis, we provide key recommendations for researchers and practitioners:

  • Evaluate mitigations across multiple demographic attributes, not just one
  • Report both utility and fairness metrics to understand trade-offs
  • Consider data characteristics (size, sparsity, imbalance) when selecting mitigations
  • Release code and detailed experimental protocols to ensure reproducibility
  • Test on multiple datasets to assess transferability

BibTeX

@article{boratto2023practical,
  author = {Boratto, Ludovico and Fenu, Gianni and Marras, Mirko and Medda, Giacomo},
  title = {Practical perspectives of consumer fairness in recommendation},
  journal = {Information Processing \& Management},
  volume = {60},
  number = {2},
  pages = {103208},
  year = {2023},
  doi = {10.1016/j.ipm.2022.103208},
  publisher = {Elsevier}
}