AitorSoroa (@aitor57) 's Twitter Profile
AitorSoroa

@aitor57

ID: 297848855

calendar_today13-05-2011 06:53:21

623 Tweet

130 Followers

183 Following

DSN - Data Science Nigeria (@dsn_ai_network) 's Twitter Profile Photo

The highly anticipated #NeurIPS2024 conference, one of the largest in Machine Learning and computational neuroscience, kicks off today! Over the coming days, we’ll spotlight groundbreaking research being presented, starting with “BertaQA: How much do Language Models know about

The highly anticipated #NeurIPS2024 conference, one of the largest in Machine Learning and computational neuroscience, kicks off today!

Over the coming days, we’ll spotlight groundbreaking research being presented, starting with “BertaQA: How much do Language Models know about
HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

Adimen artifizialeko adituen bila zabiltza? HiTZ zentroan badituzu hamaika emakume! #M8 #martxoak8 (Argazkian beste asko falta zaizkigu!)

Adimen artifizialeko adituen bila zabiltza? HiTZ zentroan badituzu hamaika emakume! #M8 #martxoak8 
(Argazkian beste asko falta zaizkigu!)
Oscar Sainz (@osainz59) 's Twitter Profile Photo

Oso garrantzitsua da gizartean eragina eduki dezaketen teknologiak modu ireki batean garatzea. HiTZ zentroa (UPV/EHU)|n helburu horrekin egiten dugu lan, lizentzia irekiko datuak erabiliz euskara eta euskal kultura hizkuntza-ereduei irakaten. Erronka honetan lagundu nahi? Ikusi 🧵

HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

📊#Ebaluatoia-ren lehen 5 egunetako datuak! 📊 775+ erabiltzaile eta 6000+ bidalketa! 🚀 Mila esker guztioi! 💕 Erronka: 20000 bidalketa lortzea apirilaren 2a baino lehen! 🕒 Sartu ebaluatoia.hitz.eus eta egin zure galdera!

HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

🎉Ebaluatoia amaitu da! 🎉 Guztira 1.680 pertsona erregistratu dira eta 12.890 bidalketa jaso ditugu! Mila esker parte hartu duzuen guztiei! ebaluatoia.hitz.eus 📅Adi! Zozketa apirilaren 10ean izango da, 15:00etan, Informatika Fakultatean edo zuzenean HiTZeko YT kanalean!

HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

Ostegunero, HiTZ zentroko kideak biltzen gara gure ikerketen berri emateko HiTZ mintegian. Aste honetan, bi tesi proiektu aurkeztu dira: Irune Zubiaga-k "Learning to Judge: Automated Multilingual Evaluation of LLM-Generated Text" eta Blanca C-F "Critical Questions Generation"

Ostegunero, HiTZ zentroko kideak biltzen gara gure ikerketen berri emateko HiTZ mintegian. Aste honetan, bi tesi proiektu aurkeztu dira: <a href="/iruzubiaga/">Irune Zubiaga</a>-k  "Learning to Judge: Automated Multilingual Evaluation of LLM-Generated Text" eta <a href="/Blanca_C_Fi/">Blanca C-F</a> "Critical Questions Generation"
HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

[1/7] #newHitzPaper Many languages are underserved by open LLMs, and face the following question: Which is the best way to produce open instruction-tuned LLMs for low-resource languages? We obtained great results for a cost-effective option! 📰 arxiv.org/abs/2506.07597

[1/7]
#newHitzPaper

Many languages are underserved by open LLMs, and face the following question: Which is the best way to produce open instruction-tuned LLMs for low-resource languages?

We obtained great results for a cost-effective option!

📰 arxiv.org/abs/2506.07597
HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

[2/7] 🤔Why does this matter? • Most LLMs excel in English but struggle with low-resource languages like Basque (~1000x less data than English). • The standard instruction-tuning pipeline (base model → CPT → instruction tuning) may not be optimal for low-resource scenarios.

HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

[3/7] 🔬 Our experimental setup: 17 model variants using different backbone models (base/instruct) and data combinations (Basque corpus, English/Basque synthetic instructions). Evaluated with 🎯 benchmarks AND🫂human preferences from 1,285 Basque speakers (12,890 annotations).

[3/7]
🔬 Our experimental setup: 17 model variants using different backbone models (base/instruct) and data combinations (Basque corpus, English/Basque synthetic instructions).

Evaluated with 🎯 benchmarks AND🫂human preferences from 1,285 Basque speakers (12,890 annotations).
HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

[4/7] Key findings: 1⃣Language corpora is essential: models need exposure to plain Basque text 2⃣Starting from instructed models beats the standard base→instruct pipeline 3⃣English-only instructions work well, but combining with Basque instructions yields the most robust models

[4/7] 
Key findings:
1⃣Language corpora is essential: models need exposure to plain Basque text
2⃣Starting from instructed models beats the standard base→instruct pipeline
3⃣English-only instructions work well, but combining with Basque instructions yields the most robust models
HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

[5/7] 🎉 Bonus results! Our 70B model approaches the performance of frontier models like GPT-4o and Claude 3.5 Sonnet on both Basque benchmarks and human evaluation, even outperforming GPT-4o on local knowledge tasks.

[5/7]
🎉 Bonus results!

Our 70B model approaches the performance of frontier models like GPT-4o and Claude 3.5 Sonnet on both Basque benchmarks and human evaluation, even outperforming GPT-4o on local knowledge tasks.
HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

[6/7] 🙏 Thanks to the Basque-speaking community for their participation! 💻 We're releasing models, synthetic instruction datasets, and human preference data to support future research on low-resource languages: github.com/hitz-zentroa/l…

HiTZ zentroa (UPV/EHU) (@hitz_zentroa) 's Twitter Profile Photo

We also had Maite Heredia present her PhD thesis so far, titled Evaluation of LLMs in Multilingual Settings: The Case of Code-Switching, which explores CS generation and evaluation for high- and low-resource language pairs.

We also had <a href="/maitehered/">Maite Heredia</a> present her PhD thesis so far, titled Evaluation of LLMs in Multilingual Settings: The Case of Code-Switching, which explores CS generation and evaluation for high- and low-resource language pairs.
Eneko Agirre @eagirre.bsky.social (@eagirre) 's Twitter Profile Photo

Hizkuntza askorentzat txatbot irekiak ez dira ondo aritzen. Zein da hizkuntza txikietarako txatbot irekiak sortzeko metodo onena? Berriki plazaratu den ikerlanean berri onak daude, euskararako kalitatezko txatbota eraikitzea lortu dugu! Oharra: labur.eus/fltqqify 1/8 🧵👇

Hizkuntza askorentzat txatbot irekiak ez dira ondo aritzen. Zein da hizkuntza txikietarako txatbot irekiak sortzeko metodo onena?

Berriki plazaratu den ikerlanean berri onak daude, euskararako kalitatezko txatbota eraikitzea lortu dugu!

Oharra: labur.eus/fltqqify

1/8 🧵👇