Dimitris Bertsimas (@dbertsim) 's Twitter Profile
Dimitris Bertsimas

@dbertsim

MIT professor, analytics, optimizer, Machine Learner, entrepreneur, philatelist

ID: 896010487346941952

linkhttp://www.mit.edu/~dbertsim/ calendar_today11-08-2017 14:08:42

53 Tweet

2,2K Followers

92 Following

npj Digital Medicine (@npjdigitalmed) 's Twitter Profile Photo

The Holistic AI in Medicine (HAIM) framework from Dimitris Bertsimas et al. in MIT Jameel Clinic for AI & Health is a pipeline to receive multimodal patient data + use generalizable pre-processing + #machinelearning modelling stages adaptable to multiple health related tasks. nature.com/articles/s4174…

The Holistic AI in Medicine (HAIM) framework from <a href="/dbertsim/">Dimitris Bertsimas</a> et al. in <a href="/AIHealthMIT/">MIT Jameel Clinic for AI & Health</a> is a pipeline to receive multimodal patient data + use generalizable pre-processing + #machinelearning modelling stages adaptable to multiple health related tasks.

nature.com/articles/s4174…
Stefanos Kechagias (@stefanoske) 's Twitter Profile Photo

If you are into #MachineLearning and #Statistics check this out. I would also highly recommend the Machine Learning under a modern optimization lens book by Dimitris Bertsimas and Dunn. Here are two two teaser must watch imo youtube videos youtu.be/7w9aRrYgGEs youtu.be/jJgdJaCo568

MIT Jameel Clinic for AI & Health (@aihealthmit) 's Twitter Profile Photo

pipelines that can consistently be applied to train multimodal AI/ML systems & outperform their single-modality counterparts has remained challenging. #JameelClinic faculty lead Dimitris Bertsimas, executive director ignacio fuentes, postdoc Luis Ruben Soenksen, Yu Ma, Cynthia Zeng,... (2/4)

pipelines that can consistently be applied to train multimodal AI/ML systems &amp; outperform their single-modality counterparts has remained challenging. #JameelClinic faculty lead <a href="/dbertsim/">Dimitris Bertsimas</a>, executive director <a href="/ifuentes3/">ignacio fuentes</a>, postdoc <a href="/lrsoenksen/">Luis Ruben Soenksen</a>, Yu Ma, <a href="/CynthiaZeng1/">Cynthia Zeng</a>,... (2/4)
Dimitris Bertsimas (@dbertsim) 's Twitter Profile Photo

My book with David Gamarnik “Queueing Theory: Classical and Modern Methods” was published. It was a long journey that lasted two decades but both of us are delighted with the journeys completion. For more details see dynamic-ideas.com/books/quueing-…

My book with David Gamarnik “Queueing Theory: Classical and Modern Methods” was published. It was a long journey that lasted two decades but both  of us are delighted with the journeys completion. For more details see dynamic-ideas.com/books/quueing-…
Ryan Cory-Wright (@ryancorywright) 's Twitter Profile Photo

Delighted to share that our paper "A new perspective on low-rank optimization" has just been accepted for publication by Math Programming! Valid & often strong lower bounds on low-rank problems via a generalization of the perspective reformulation from mixed-integer optimization

Ryan Cory-Wright (@ryancorywright) 's Twitter Profile Photo

📢New preprint alert! arxiv.org/abs/2303.07695 We use sampling schemes and clustering to improve the scalability of deterministic Bender's decomposition on data-driven network design problems, while maintaining optimality. w/ Dimitris Bertsimas, Jean Pauphilet, and Periklis Petridis

Dimitris Bertsimas (@dbertsim) 's Twitter Profile Photo

The paper presents a novel holistic deep learning framework that improves accuracy, robustness, sparsity, and stability over standard deep learning models, as demonstrated by extensive experiments on both tabular and image data sets. arxiv.org/abs/2110.15829

Dimitris Bertsimas (@dbertsim) 's Twitter Profile Photo

As part of HIAS and together with Professor Georgios Stamou from NTUA, Greece we are offering a course on Universal AI (in English, free of charge) aicourse2023.hellenic-ias.org on July 3-5, 2023 in Athens, Greece. Prospective participants can declare their interest in the website.

Wes Gurnee (@wesg52) 's Twitter Profile Photo

Neural nets are often thought of as feature extractors. But what features are neurons in LLMs actually extracting? In our new paper, we leverage sparse probing to find out arxiv.org/abs/2305.01610. A 🧵:

Neural nets are often thought of as feature extractors. But what features are neurons in LLMs actually extracting? In our new paper, we leverage sparse probing to find out arxiv.org/abs/2305.01610.  A 🧵:
Wes Gurnee (@wesg52) 's Twitter Profile Photo

One large family of neurons we find are “context” neurons, which activate only for tokens in a particular context (French, Python code, US patent documents, etc). When deleting these neurons the loss increases in the relevant context while leaving other contexts unaffected!

One large family of neurons we find are “context” neurons, which activate only for tokens in a particular context (French, Python code, US patent documents, etc). When deleting these neurons the loss increases in the relevant context while leaving other contexts unaffected!
Wes Gurnee (@wesg52) 's Twitter Profile Photo

But what if there are more features than there are neurons? This results in polysemantic neurons which fire for a large set of unrelated features. Here we show a single early layer neuron which activates for a large collection of unrelated n-grams.

But what if there are more features than there are neurons? This results in polysemantic neurons which fire for a large set of unrelated features. Here we show a single early layer neuron which activates for a large collection of unrelated n-grams.
Wes Gurnee (@wesg52) 's Twitter Profile Photo

Early layers seem to use sparse combinations of neurons to represent many features in superposition. That is, using the activations of multiple polysemantic neurons to boost the signal of the true feature over all interfering features (here “social security” vs. adjacent bigrams)

Early layers seem to use sparse combinations of neurons to represent many features in superposition. That is, using the activations of multiple polysemantic neurons to boost the signal of the true feature over all interfering features (here “social security” vs. adjacent bigrams)
Wes Gurnee (@wesg52) 's Twitter Profile Photo

Results in toy models from Anthropic and Chris Olah suggest a potential mechanistic fingerprint of superposition: large MLP weight norms and negative biases. We find a striking drop in early layers in the Pythia models from EleutherAI and Stella Biderman.

Results in toy models from <a href="/AnthropicAI/">Anthropic</a> and <a href="/ch402/">Chris Olah</a> suggest a potential mechanistic fingerprint of superposition: large MLP weight norms and negative biases. We find a striking drop in early layers in the Pythia models from <a href="/AiEleuther/">EleutherAI</a> and <a href="/BlancheMinerva/">Stella Biderman</a>.
Wes Gurnee (@wesg52) 's Twitter Profile Photo

What happens with scale? We find representational sparsity increases on average, but different features obey different scaling dynamics. In particular, quantization and neuron splitting: features both emerge and split into finer grained features.

What happens with scale? We find representational sparsity increases on average, but different features obey different scaling dynamics. In particular, quantization and neuron splitting: features both emerge and split into finer grained features.
Wes Gurnee (@wesg52) 's Twitter Profile Photo

While we found tons of interesting neurons with sparse probing, it requires careful follow up analysis to draw more rigorous conclusions. E.g., athlete neurons turn out to be more general sport neurons when analyzing max average activating tokens.

While we found tons of interesting neurons with sparse probing, it requires careful follow up analysis to draw more rigorous conclusions. E.g., athlete neurons turn out to be more general sport neurons when analyzing max average activating tokens.
Wes Gurnee (@wesg52) 's Twitter Profile Photo

Precision and recall can also be helpful guides, and remind us that it should not be assumed a model will learn to represent features in an ontology convenient or familiar to humans.

Precision and recall can also be helpful guides, and remind us that it should not be assumed a model will learn to represent features in an ontology convenient or familiar to humans.
Wes Gurnee (@wesg52) 's Twitter Profile Photo

This paper would not have been possible without my coauthors Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, and Dimitris Bertsimas or all the foundational and inspirational work from Chris Olah, Yonatan Belinkov, and many others! Read the full paper: arxiv.org/abs/2305.01610

JAMA Surgery (@jamasurgery) 's Twitter Profile Photo

Interpretable machine learning methodologies are powerful tools to diagnose and remedy system-related bias in care, such as disparities in access to postinjury rehabilitation care. ja.ma/3P8Sdjc Haytham Kaafarani Dimitris Bertsimas Anthony Gebran @LMaurerMD

arXiv math.OC Optimization and Control (@mathocb) 's Twitter Profile Photo

Dimitris Bertsimas, Georgios Margaritis: Global Optimization: A Machine Learning Approach arxiv.org/abs/2311.01742 arxiv.org/pdf/2311.01742