About ChronoPlay
Understanding the dual dynamics that shape game knowledge and player behavior over time.
Dual Dynamics in Gaming
Games evolve through two concurrent dynamics: Knowledge Evolution, where game content continuously updates through patches, expansions, and balance changes; and User Interest Drift, where player attention shifts across topics over time. ChronoPlay captures both dynamics to create temporally-aware benchmarks.
Dual-Source Data Synthesis Pipeline
ChronoPlay synthesizes benchmark data from two complementary sources: authoritative game knowledge bases (wikis, patch notes) and player community discussions (forums, social media). This dual-source approach ensures both factual accuracy and real-world relevance.
Dual-Dynamic Update Mechanism
The framework continuously evolves through NER-based knowledge updates that track game content changes, and interest drift detection that monitors shifting player attention patterns. This dual-dynamic mechanism keeps the benchmark aligned with the ever-changing game landscape.
Leaderboard
Comparing RAG system performance across all three game domains.
Loading leaderboard data…
How to Submit Your Results
- Fork this repository
- Create a new JSON file in the
submissions/directory (e.g.,my_system.json) - Fill in your results in the following format:
{ "system_name": "My RAG System v1.0", "description": "Dense retrieval with BM25 reranking and GPT-4 for generation", "games": { "dune": { "topk": 3, "recall": 0.85, "f1": 0.78, "ndcg": 0.82, "correctness": 0.88, "faithfulness": 0.91 }, "dying_light_2": { "topk": 3, "recall": 0.83, "f1": 0.76, "ndcg": 0.80, "correctness": 0.86, "faithfulness": 0.89 }, "pubg_mobile": { "topk": 3, "recall": 0.81, "f1": 0.74, "ndcg": 0.78, "correctness": 0.84, "faithfulness": 0.87 } } } - Submit a Pull Request
- After the PR is merged, the leaderboard will be automatically updated
Note: All metric scores (recall, f1, ndcg, correctness, faithfulness) should be decimals between 0-1. The scores R, G, and Total Score will be calculated automatically by the frontend. You must provide results for all three games.
Citation
If you use this leaderboard or the ChronoPlay benchmark in your research, please cite our paper:
@article{he2025chronoplay,
title={ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks},
author={He, Liyang and Zhang, Yuren and Zhu, Ziwei and Li, Zhenghui and Tong, Shiwei},
journal={arXiv preprint arXiv:2510.18455},
year={2025}
}