Announcement_18
Our works on LM-based evaluation with human agreement guarantee (top 1.8%) and Benchmarking LLMs using real-world user queries (top 5%) have been accepted to appear at ICLR 2025.
Our works on LM-based evaluation with human agreement guarantee (top 1.8%) and Benchmarking LLMs using real-world user queries (top 5%) have been accepted to appear at ICLR 2025.