🏆 Model Leaderboard
Definitions of the Metrics
• | stance(human message) - stance(LLM message) |.
• Interpretation: the smaller the distance is, the closer the LLM is to human in terms of the stances of the messages they generate.
• Standard Deviation(stances of the people within the same group g)
• Interpretation: The smaller the value is, the closer each person's stance is to each other.
• The change in Opinion Diversity from the start to the end
• ΔSDtweetg = SDtweetfinal(g) − SDtweetinit(g)
• ΔSDprivateg = SDprivatefinal(g) − SDprivateinit(g)
• Interpretation: negative values indicate opinion convergence after the debate, and positive values indicate opinion divergence.
Note: The stance labels are based classification onto a 6-point scale: {Certainly disagree (-2.5), Probably disagree (-1.5), Lean disagree (−0.5), Lean agree (+0.5), Probably agree (+1.5), Certainly agree (+2.5)} (see paper for details).
Utterance-level Evaluation of Role-playing LLM Agents
Depth Topics - Avg Stance Δ (Full Conversation Simulation)
Average stance change for Depth topics in Full Conversation Simulation. Bars are sorted in ascending order; lower values indicate smaller shifts in stance magnitude.
Group-Level Alignment in Opinion Dynamics: LLM Groups vs. Human Groups (Depth Topics)
Change in within-group opinion diversity for humans and RPLA simulations on Depth Topics. Bars show averages over groups; more negative values indicate stronger convergence and positive values indicate divergence. Error bars denote the standard error of the mean across groups.
Fine-tuning with DPO on the Dataset Increases Group-level Alignment: Aligns LLM Opinion Dynamics on Unseen Topics
Change in within-group opinion diversity after SFT/DPO post-training.
Note: This is only for breadth topics as breadth topic data scale supports proper fine-tuning.