Graph Augmentation for Intersectional Unfairness Mitigation: A Study across Dataset Scales and Interaction Densities

ACM Transactions on Recommender Systems (ACM TORS) 2026
Graph Augmentation for Intersectional Unfairness Mitigation: A Study across Dataset Scales and Interaction Densities - Overview

Abstract

Recent work on fairness-aware graph collaborative filtering (GCF) has shown the effectiveness of graph augmentation as a post-processing strategy for mitigating consumer unfairness. However, most studies remain confined to binary fairness setups and operate under limited experimental conditions, often relying on sparse and small-scale datasets. In this paper, we extend our fairness-aware augmentation method to address intersectional unfairness across demographic subgroups, a setting where the intersection of multiple sensitive attributes leads to fine-grained subgroups. To this end, we reformulate the fairness objective to incorporate intersectional demographic groups and evaluate our extended method across interaction configurations that vary in density and scale. Our results reveal that the effectiveness of fair graph augmentations is model-dependent and sensitive to dataset properties. We show that the edges selected during augmentation tend to concentrate around interpretable structural patterns driven by the connected nodes’ characteristics. Furthermore, analyzing how these augmented edges differ across graph-level attributes offers actionable insights into the potential benefits of fairness-oriented graph modifications. Finally, we compare our method with recent fairness-aware baselines, explore the impact of augmenting different graph regions, and assess our mitigation strategy under scenarios with minimal unfairness.

Motivation

Graph collaborative filtering (GCF) powered by graph neural networks (GNNs) has become a leading paradigm for personalized recommendation. As these systems shape access to information and opportunities, ensuring they treat all user groups equitably is increasingly important. Prior work on fairness-aware GCF, however, suffers from three compounding limitations:

  • Binary fairness setups: Most studies define consumer groups along a single binary attribute (e.g., Male vs. Female), ignoring how multiple attributes interact to produce compounded disadvantages.
  • Intersectionality gap: The intersection of multiple sensitive attributes (e.g., Gender × Age) creates fine-grained subgroups — Older Females, Younger Males, etc. — whose distinct fairness needs are invisible to binary formulations.
  • Small-scale evaluation: Experiments are typically conducted on sparse, small datasets, leaving open whether findings generalise to denser, large-scale interaction graphs.

Goal: Extend fairness-aware graph augmentation to the intersectional setting and rigorously evaluate it across five datasets spanning a wide range of densities and scales, from 1.90% to 35.01% interaction density.

Method: Intersectionally Fair Graph Augmentation

We reformulate the fairness objective of graph augmentation to operate over intersectional demographic groups formed by combining two or more sensitive attributes. The core concepts are:

  • IDPR (Intersectional Demographic Parity in Recommendation): A fairness criterion requiring that recommendation utility (NDCG) be equal across all intersectional subgroups simultaneously, rather than just between two groups.
  • ε-IDPR: A practical relaxation of the strict IDPR constraint that allows a tolerance ε in the parity requirement, making the optimisation tractable while retaining meaningful fairness guarantees.
  • Loss function: The augmentation is guided by a two-term objective — a fairness loss Lfair minimising the utility disparity Δ across intersectional subgroups, and a distance loss Ldist controlling how much the augmented graph departs from the original.

Sampling Policies

To make augmentation tractable, candidate edges are drawn from a restricted pool defined by sampling policies applied independently on the user and item side:

PolicySideDescription
ZNUserUsers with no relevant items in the top-k list (NDCG@k = 0)
FRUserUsers furthest from the advantaged group in graph distance
IRUserUsers selected by inter-group distance criteria
IPItemItems most preferred by the disadvantaged group
ITItemItems with high inter-group transferability
PRItemItems selected by popularity-relative criteria

Experimental Setup

We evaluate across five datasets that collectively cover sparse, medium-density, and dense interaction graphs in the music, movie, and short-video domains, using two intersectional sensitive attribute pairs (Gender × Age and One-hot Feat0 × One-hot Feat13).

Datasets

LFM1MML1MKRECS
# Users4,5466,0401,401
# Items12,4923,7063,060
# Interactions1,082,1321,000,2091,502,531
Min. Degree per user2020229
Density1.90%4.47%35.01%
Sensitive AttributeGender | AgeGender | AgeOne-hot Feat0 | One-hot Feat13
Subgroup M|Y / 0|044.3%41.2%58.7%
Subgroup M|O / 1|034.2%30.5%31.8%
Subgroup F|Y / 0|116.5%15.5%5.2%
Subgroup F|O / 1|15.1%12.8%4.3%

GCF Models

We evaluate five state-of-the-art graph collaborative filtering models: HMLET, LightGCN, NGCF, SGL, and XSimGCL.

RQ1: Intersectional Fairness

The first research question asks whether our fairness-aware augmentation effectively reduces utility disparity Δ across intersectional subgroups on the three base datasets (LFM1M, ML1M, KRECS).

Table 2: Intersectional fairness results across five GCF models and three datasets
Table 2: Recommendation utility (NDCG) and utility disparity (Δ) for the base model and after augmentation across five GCF models and three datasets. Lower Δ indicates better intersectional fairness.
  • Model-dependence: The effectiveness of fair graph augmentation varies substantially across GCF models — augmentation reliably reduces Δ on some architectures while having limited or no impact on others.
  • Utility preservation: In settings where fairness improves, recommendation utility (NDCG) is largely preserved or increased, showing the augmentation does not trade off quality against fairness in the intersectional case.
  • Dataset sensitivity: Results differ across LFM1M, ML1M, and KRECS, pointing to a strong interaction between dataset properties (density, scale) and mitigation effectiveness.

RQ2: Scale and Density

To isolate the effect of interaction density and dataset scale, we introduce two additional datasets derived from MovieLens 1M and KuaiRec with higher k-core thresholds, producing denser interaction graphs.

ML1MDKRECB
# Users2,5957,101
# Items1,8298,720
# Interactions741,47810,155,233
K-core Threshold110
Min. Degree per user11079
Density15.61%16.40%
Sensitive AttributeGender | AgeOne-hot Feat0 | One-hot Feat13
Subgroup M|Y / 0|046.7%55.4%
Subgroup M|O / 1|029.0%34.1%
Subgroup F|Y / 0|114.0%5.5%
Subgroup F|O / 1|110.3%4.9%
Table 4: Fairness results on ML1MD and KRECB datasets
Table 4: Utility (NDCG) and disparity (Δ) for the base model and after augmentation on the denser ML1MD and KRECB datasets across all five GCF models.
  • Density amplifies model-dependence: The pattern of which models benefit from augmentation shifts when density increases, confirming that dataset properties interact non-trivially with the augmentation mechanism.
  • Large-scale feasibility: Our method successfully operates on KRECB with over 10 million interactions, demonstrating scalability beyond the small-scale datasets used in prior work.

RQ3: Augmentation Interpretability

We analyse the structural characteristics of the edges added during augmentation using three graph-level metrics: node degree (DEG), degree-type ratio (DTY), and inter-group distance (IGD). Edges are grouped into quartiles Q1–Q4 per metric and per intersectional subgroup.

Figure 2: Added edges distribution for Gender×Age groups across ML1M, LFM1M, ML1MD
Figure 2: Distribution of added edges over quartiles (Q1–Q4) for Gender×Age intersectional groups (Older Males, Younger Males, Older Females, Younger Females) across ML1M, LFM1M, and ML1MD, split by DEG, DTY, and IGD graph metrics.
Figure 3: Added edges distribution for One-hot Feat0×Feat13 groups in KRECB and KRECS
Figure 3: Distribution of added edges over quartiles (Q1–Q4) for One-hot Feat0 × One-hot Feat13 groups (0|0, 0|1, 1|0, 1|1) in KRECB and KRECS, split by DEG, DTY, and IGD graph metrics.
  • Structural concentration: Augmented edges do not distribute uniformly — they concentrate in interpretable structural regions of the graph, driven by the degree and type characteristics of the connected nodes.
  • Subgroup-specific patterns: Different intersectional subgroups receive edges in different graph regions, revealing that the augmentation implicitly adapts to each group's structural position.
  • Consistent across dataset types: The DEG/DTY/IGD patterns are broadly consistent between Gender×Age datasets and One-hot attribute datasets, suggesting the interpretability findings generalise across attribute types.

RQ4: Distributional Shift

We investigate whether the gap between validation-set and test-set fairness gains can be predicted from the distributional shift between the two splits, measured via energy distance.

Figure 4: Scatter plot of energy distance vs ratio of fairness gains
Figure 4: Scatter plot of validation–test energy distance versus the ratio of fairness gains (test / validation). Spearman ρ = −0.19 (p = 0.000343).

Finding: A statistically significant negative correlation (Spearman ρ = −0.19, p = 0.000343) is observed between the validation–test energy distance and the ratio of fairness gains. Larger distributional shifts between validation and test sets are associated with weaker generalisation of fairness improvements from validation to test, providing a practical signal for anticipating mitigation reliability.

Additional Analyses

Sampling Policy Ablation

We ablate all 15 combinations of user-side and item-side sampling policies across five GCF models and five datasets to identify which policy combinations are most consistently effective.

Figure 5: Heatmap of utility disparity across sampling policy combinations
Figure 5: Heatmap of utility disparity Δ across 5 GCF models × 5 datasets × 15 sampling policy combinations (validation set). Darker cells indicate lower disparity (better fairness).

Leaky Coefficient Ablation

We study how the leaky coefficient α — which controls the sharpness of the augmentation objective — affects the resulting fairness level Δ across all five datasets.

Figure 6: Fairness level as a function of leaky coefficient alpha
Figure 6: Fairness level Δ as a function of leaky coefficient α (0.05, 0.1, 0.2, 0.5) across all five datasets, showing sensitivity of the mitigation to this hyperparameter.

Comparison with ITFR

We compare our method against the ITFR fairness-aware baseline on LightGCN across all five datasets. Bold values indicate the best result per metric per dataset; asterisked (*) values signal low absolute utility.

LFM1M NDCG↑LFM1M Δ↓ML1M NDCG↑ML1M Δ↓ML1MD NDCG↑ML1MD Δ↓KRECS NDCG↑KRECS Δ↓KRECB NDCG↑KRECB Δ↓
ITFR16.490.5611.850.2816.640.42*4.290.19*6.160.10
Ours17.670.3212.620.2520.330.64*6.290.29*5.540.03

Our method achieves superior utility on four of five datasets and the best fairness on three of five, while ITFR wins on fairness for ML1MD and KRECS. No single method dominates across all configurations, highlighting the importance of dataset-aware method selection.

Key Contributions

  • Intersectional Fairness Formulation: First extension of fairness-aware graph augmentation beyond binary setups, introducing the IDPR criterion and its ε-IDPR relaxation for intersectional demographic groups.
  • Large-scale Evaluation: Comprehensive experiments across five datasets spanning densities from 1.90% to 35.01% and up to 10 million interactions, demonstrating scalability and dataset sensitivity of the method.
  • Interpretable Augmentation Patterns: Structural analysis via DEG, DTY, and IGD metrics reveals that augmented edges concentrate in interpretable graph regions, offering actionable insights into fairness-oriented graph modifications.
  • Distributional Shift Indicator: Evidence that energy distance between validation and test distributions predicts generalisation of fairness gains (Spearman ρ = −0.19), providing a practical diagnostic signal.
  • Ablation and Baseline Comparison: Thorough ablation of 15 sampling policy combinations and comparison with ITFR, clarifying conditions under which each approach is preferable.

BibTeX

@article{boratto2026intersectional,
  author = {Boratto, Ludovico and Fabbri, Francesco and Fenu, Gianni and Marras, Mirko and Medda, Giacomo},
  title = {Graph Augmentation for Intersectional Unfairness Mitigation: A Study across Dataset Scales and Interaction Densities},
  journal = {ACM Transactions on Recommender Systems},
  year = {2026},
  doi = {10.1145/3798097},
  url = {https://doi.org/10.1145/3798097}
}