1Tencent2The Chinese University of Hong
Kong3Independent Researcher
† Corresponding author: shiweitong@tencent.com
ChronoPlay is a novel framework for automated and continuous generation of game
Retrieval Augmented Generation benchmarks. This leaderboard evaluates RAG systems across three popular
games:
Dune: Awakening, Dying Light 2, and PUBG Mobile.
📊 Evaluation Methodology: While the original paper evaluates models across different
temporal
segments to capture the dual dynamics of game evolution, this leaderboard presents a holistic
evaluation
for each game. Each system is assessed on the complete game dataset to provide a comprehensive measure
of
overall model performance, making it easier to compare different RAG approaches at a glance.
After the PR is merged, the leaderboard will be automatically updated
Note: All metric scores (recall, f1, ndcg, correctness, faithfulness) should be decimals
between 0-1.
The scores R, G, and Total Score will be calculated automatically by the frontend. You must provide results
for all three games.
📚 Citation
If you use this leaderboard or the ChronoPlay benchmark in your research, please cite our paper:
@article{he2025chronoplay,
title={ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks},
author={He, Liyang and Zhang, Yuren and Zhu, Ziwei and Li, Zhenghui and Tong, Shiwei},
journal={arXiv preprint arXiv:2510.18455},
year={2025}
}