Elie Bursztein (@elie) 's Twitter Profile
Elie Bursztein

@elie

AI Cybersecurity @Google & @DeepMind. Help advance AI cybersecurity capabilities and make AI safe & secure for all. @EtteillaOrg Art Foundation founder.

ID: 57142771

linkhttp://www.elie.net calendar_today15-07-2009 21:28:39

3,3K Tweet

62,62K Followers

128 Following

Elie Bursztein (@elie) 's Twitter Profile Photo

Very excited to participate to the #LLM Agent MOOC Hackathon. Can't wait to see what participants will come up with! #hackathon #A

Elie Bursztein (@elie) 's Twitter Profile Photo

[weekend read] Human Creativity in the Age of LLMs - arxiv.org/abs/2410.03703 - Worryingly this #research shows that #AI might boost short-term creativity at the expense of long-term one. Figuring out how to leverage #LLM without degrading human long-term capabilities is vital.

[weekend read] Human Creativity in the Age of LLMs - arxiv.org/abs/2410.03703 - Worryingly this #research shows that #AI might boost short-term creativity at the expense of long-term one. Figuring out how to leverage #LLM without degrading human long-term capabilities is vital.
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild arxiv.org/pdf/2405.11697 This comprehensive study clearly highlight the magnitude of the problem with great examples and measurements. #research #AI #misinformation #disinformation

[Weekend Read] AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild arxiv.org/pdf/2405.11697 This comprehensive study clearly highlight the magnitude of the problem with great examples and measurements.

#research #AI #misinformation #disinformation
François Chollet (@fchollet) 's Twitter Profile Photo

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.

It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task
Elie Bursztein (@elie) 's Twitter Profile Photo

[Tool Tuesday] LLM the best CLI utility for interacting with Large Models - github.com/simonw/llm This comprehensive tool support images, local/remote models and shell workflow. You can for example type: cat mycode.py | llm -s "Explain this code" #LLM #tool #AI

[Tool Tuesday] LLM the best CLI utility for interacting with Large Models - github.com/simonw/llm This comprehensive tool support images, local/remote models and shell workflow. You can for example type: cat mycode.py | llm -s "Explain this code"

#LLM #tool #AI
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] Humanity’s Last Exam - static.scale.com/uploads/654197… New large scale knowledge benchmark where the best models barely reach 9% with DeepSeek-R1 outperforming everyone. #benchmark #research #AI #LLM #deepseek #openai #anthropic #gemini

[Weekend Read] Humanity’s Last Exam - static.scale.com/uploads/654197… New large scale  knowledge benchmark where the best models barely reach 9% with DeepSeek-R1 outperforming everyone. 

#benchmark #research #AI #LLM #deepseek  #openai #anthropic #gemini
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training - arxiv.org/pdf/2501.17161 - Using Reinforcement Learning (RL) help generalizing and SFT help stabilizing. #AI #LLM #research #RL

Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence - microsoft.com/en-us/research… The more people are confident in #GenAI the less they think critically. #Research #AI #education

[Weekend Read] The Impact of Generative AI on Critical Thinking: Self-Reported
Reductions in Cognitive Effort and Confidence - microsoft.com/en-us/research… The more people are confident in #GenAI the less they think critically.

#Research #AI #education
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] Do Not Trust Licenses You See—Dataset Compliance Requires Massive-Scale AI-Powered Lifecycle Tracing - lgresearch.ai/data/upload/LG… Present LG new dataset legal risk framework and the #AI agent built to use it to automatically to review datasets compliance at scale.

[Weekend Read]  Do Not Trust Licenses You See—Dataset Compliance Requires Massive-Scale AI-Powered Lifecycle Tracing -  lgresearch.ai/data/upload/LG…  Present LG new dataset legal risk framework and the #AI agent built to use it to automatically to review datasets compliance at scale.
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] Reasoning Language Models: A Blueprint - arxiv.org/abs/2501.11223 All you need to know on how thinking/reasoning models are trained and evaluated. #AI #RLM #LLM

[Weekend Read] Reasoning Language Models: A Blueprint - arxiv.org/abs/2501.11223 All you need to know on how thinking/reasoning models are trained and evaluated.  

#AI #RLM #LLM
Elie Bursztein (@elie) 's Twitter Profile Photo

Gemma 3 is here! Our new open models are incredibly efficient - the largest 27B model runs on just one H100 GPU. You'd need at least 10x the compute to get similar performance from other models 👇 #AI #LLM

Gemma 3 is here! Our new open models are incredibly efficient - the largest 27B model runs on just one H100 GPU. You'd need at least 10x the compute to get similar performance from other models 👇

#AI #LLM
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] AI Search Has A Citation Problem - cjr.org/tow_center/we-… TL;DR: Doing an agent is one thing, getting to the level of reliability where an agent can be trusted is a total different ball game. Evaluating reliability is critical to true progress. #AI #LLM

[Weekend Read] AI Search Has A Citation Problem - cjr.org/tow_center/we-… TL;DR: Doing an agent is one thing,  getting to the level of reliability where an agent can be trusted is a total different ball game. Evaluating reliability is critical to true progress. 

#AI #LLM
Elie Bursztein (@elie) 's Twitter Profile Photo

Accelerating Large-Scale Test Migration with LLMs - medium.com/airbnb-enginee… Airbnb was able to leverage AI to reduce migration time by about 90% (6weeks / 1.5y) #AI #airbnb

Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] Measuring #AI Ability to Complete Long Tasks - arxiv.org/pdf/2503.14499 The ability of models to perform longer and longer tasks roughly double every 7 months. That's encouraging however I am unsure if true at 99% success rate which is needed to trust agents. #LLM

[Weekend Read] Measuring #AI Ability to Complete Long Tasks - arxiv.org/pdf/2503.14499 The ability of models to perform longer and longer tasks roughly double every 7 months. That's encouraging however I am unsure if true at 99% success rate which is needed to trust agents.

#LLM
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend read] Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification - arxiv.org/pdf/2502.01839 If you are interested in understanding how scaling computation at generation time help improve model performance this is the paper to read. #AI #LLM

[Weekend read] Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification -  arxiv.org/pdf/2502.01839 If you are interested in understanding how scaling computation at generation time help improve model performance this is the paper to read.

#AI #LLM
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] RealHarm: A Collection of Real-World Language Model Application Failures arxiv.org/abs/2504.10277 This paper by looking at real world examples of AI failure highlight the disconnect between what safety filters block and what goes wrong in practice. #safety #ai

Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] Exploring LLM Reasoning Through Controlled Prompt Variations - arxiv.org/abs/2504.02111 Show how critical it is to only have relevant data in the model context. Accurately filtering out data is very difficult and simple vector search is NOT the answer. #AI #RAG

[Weekend Read] Exploring LLM Reasoning Through Controlled Prompt Variations - arxiv.org/abs/2504.02111  Show how critical it is to only have relevant data in the model context. Accurately filtering out data is very difficult and simple vector search is NOT the answer.
#AI #RAG
Elie Bursztein (@elie) 's Twitter Profile Photo

The Phare Benchmark results key insights include that popularity on benchmarks like LMArena doesn't guarantee factual reliability and that the more confidently a user phrase its query the less willing models are willing to refute controversial claims - giskard.ai/knowledge/good…

The Phare Benchmark results key insights include that popularity on benchmarks like LMArena doesn't guarantee factual reliability and that the more confidently a user phrase its query the less willing models are willing to refute  controversial claims - giskard.ai/knowledge/good…
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] Large Language Models, Small Labor Market Effects - nber.org/system/files/w… Recent study by the National Bureau of Economic Research that shows that in Danemark despite massive investment in AI productivity gains are minimal at about ~3% #AI #LLM #Economy

[Weekend Read] Large Language Models, Small Labor Market Effects - nber.org/system/files/w… Recent study by the National Bureau of Economic Research that shows that in Danemark despite massive investment in AI productivity gains are minimal at about ~3%

#AI #LLM #Economy
Elie Bursztein (@elie) 's Twitter Profile Photo

[Weekend Read] The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity arxiv.org/abs/2506.06941 This paper highlights that while thinking models outperform regular LLMs on reasoning tasks, they still experience

[Weekend Read] The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity arxiv.org/abs/2506.06941 This paper highlights that while thinking models outperform regular LLMs on reasoning tasks, they still experience