Patrick Fernandes (@psanfernandes) 's Twitter Profile
Patrick Fernandes

@psanfernandes

PhD Student @LTIatCMU & @istecnico
Previously research @Google, @Microsoft & @Unbabel

ID: 256442599

linkhttp://patricksf.dev calendar_today23-02-2011 10:08:07

173 Tweet

622 Followers

267 Following

José Maria Pombal (@zmprcp) 's Twitter Profile Photo

New paper out 🚀 Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models: arxiv.org/abs/2504.01001. We present a framework and release a repository for creating reliable benchmarks for (V)LM tasks quickly and fully automatically.

New paper out 🚀 Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models: arxiv.org/abs/2504.01001.

We present a framework and release a repository for creating reliable benchmarks for (V)LM tasks quickly and fully automatically.
José Maria Pombal (@zmprcp) 's Twitter Profile Photo

We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters! Check out the models and training data on Huggingface: huggingface.co/collections/Un… and our paper: arxiv.org/abs/2504.04953

We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters!

Check out the models and training data on Huggingface: huggingface.co/collections/Un…
and our paper: arxiv.org/abs/2504.04953
Patrick Fernandes (@psanfernandes) 's Twitter Profile Photo

Come and chat with us about a powerful (but surprisingly underused) *test-time compute* scaling technique to improve your LLMs!