👉
Eval Research and Github Resources
PromptBench: A Unified Library for Evaluating and Understanding Large Language Models, from Microsoft, Leaderboards
Eluether AI -Â framework for few-shot evaluation of autoregressive language models
Zeno Evaluation HubÂ