Daniel Smilkov (@dsmilkov) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

If you're doing a lot of fine-tuning and dataset curation, definitely make sure to check out Lilac Garden. They were nice enough to run Capybara through it before official release and allowed me to see interesting insights that normal embedding clustering typically fails to show.

thumb_up_off_alt61

chat_bubble_outline1

repeat10

shareShare

Teknium (e/λ)

@teknium1

a year ago

In addition, I've worked with Nikhil Thorat and the Lilac team to cluster the datasets into clusters for analysis and further curation! You can access the dataset in Lilac's Hugging Face Spaces here: lilacai-lilac.hf.space/datasets#lilac… And can access the clusters by clicking this button:

In addition, I've worked with <a href="/nsthorat/">Nikhil Thorat</a> and the Lilac team to cluster the datasets into clusters for analysis and further curation!

You can access the dataset in Lilac's <a href="/huggingface/">Hugging Face</a>
Spaces here: lilacai-lilac.hf.space/datasets#lilac…

And can access the clusters by clicking this button:

thumb_up_off_alt77

chat_bubble_outline3

repeat5

shareShare

Nikhil Thorat

@nsthorat

a year ago

OpenHermes-2.5 dataset is finally here! We're hosting the full dataset with pre-computed Lilac, joining Databricks! clusters in our demo. Clusters: lilacai-lilac.hf.space/datasets#lilac…

OpenHermes-2.5 dataset is finally here!

We're hosting the full dataset with pre-computed <a href="/lilac_ai/">Lilac, joining Databricks!</a> clusters in our demo.

Clusters: lilacai-lilac.hf.space/datasets#lilac…

thumb_up_off_alt80

chat_bubble_outline2

repeat12

shareShare

Jade

@euclaise_

a year ago

Embedding-filtered version of reddit-instruct reddit-instruct is a QA dataset gathered from Reddit posts/comments, inspired by LIMA. This version filters it further using Lilac, joining Databricks!'s platform, preventing non-instruction/QA data from being included huggingface.co/datasets/eucla…

thumb_up_off_alt25

chat_bubble_outline1

repeat4

shareShare

Alex Volkov (Thursd/AI)

@altryne

a year ago

With such a warm recommendation, I had to get Lilac, joining Databricks! folks on stage, and here is our conversation and a deep dive into dataset creation, curation and classification. Just posted 📅 ThursdAI - weekly AI news podcast deep dive into Lilac and RWKV. Links as always, first comment 👇

With such a warm recommendation, I had to get <a href="/lilac_ai/">Lilac, joining Databricks!</a> folks on stage, and here is our conversation and a deep dive into dataset creation, curation and classification.

Just posted <a href="/thursdai_pod/">📅 ThursdAI - weekly AI news podcast</a> deep dive into Lilac and RWKV. Links as always, first comment 👇

thumb_up_off_alt13

chat_bubble_outline2

repeat4

shareShare

Daniel Smilkov

@dsmilkov

a year ago

Most of us underestimate what LLMs can do for understanding and transforming our datasets. That will change this year.

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare

qnguyen3

@stablequan

a year ago

Finally have some free time! Translated 1M Teknium (e/λ) OpenHermes-2.5 samples to Vietnamese, then used Lilac, joining Databricks! to reduce to 25k. Result: 25k-trained model scored the same as 1M on Vietnamese VLSP Benchmark. REALLY GOOD DATA IS ALL YOU NEED🚀 huggingface.co/datasets/ontoc…

thumb_up_off_alt114

chat_bubble_outline11

repeat12

shareShare

Nikhil Thorat

@nsthorat

a year ago

40X reduction of dataset size using Lilac, joining Databricks!, same performance. Attention to data is all you need.

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Databricks

@databricks

a year ago

We are thrilled to announce that @Lilac_AI is joining Databricks! With the integration of Lilac’s powerful tools for data exploration, customers can accelerate the development of production-quality generative AI apps using their enterprise data. #genAI dbricks.co/3IF0uHL

thumb_up_off_alt85

chat_bubble_outline2

repeat19

shareShare

Lilac, joining Databricks!

@lilac_ai

a year ago

And an article with Business Insider: businessinsider.com/databricks-acq…

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Daniel Smilkov

@dsmilkov

a year ago

AI Data curation meets enterprise data! Incredibly excited to announce that we are joining Databricks and help enterprise companies understand, build and curate their own AI solutions.

thumb_up_off_alt40

chat_bubble_outline8

repeat5

shareShare

Jonathan Frankle

@jefrankle

a year ago

Meet DBRX, a new sota open llm from Databricks. It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.

Meet DBRX, a new sota open llm from <a href="/databricks/">Databricks</a>. It's a 132B MoE with 36B active params trained from scratch on 12T tokens. It sets a new bar on all the standard benchmarks, and - as an MoE - inference is blazingly fast. Simply put, it's the model your data has been waiting for.

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat255

shareShare

Matei Zaharia

@matei_zaharia

a year ago

Yup, data processing matters a lot for LLMs, and we confirmed this by just quickly retraining MPT-7B on our new datasets. We built some great scalable tools for LLM data prep internally on Apache Spark and also benefited a lot from Lilac, joining Databricks!.

thumb_up_off_alt79

chat_bubble_outline3

repeat13

shareShare

Nikhil Thorat

@nsthorat

a year ago

✨ Lessons of a first time founder ✨ I wrote down a bunch of important lessons Daniel Smilkov and I learned over the last year building Lilac, joining Databricks!. We had a short ride, but we learned a ton! This blog is mostly for technical folks who know how to build product, and are trying to

thumb_up_off_alt196

chat_bubble_outline16

repeat25

shareShare

Daniel Smilkov

@dsmilkov

10 months ago

Our AI judges are like a CI system that keeps getting better and faster but you still have to write the unit tests… for now

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Sean Kulinski

@seankski

8 months ago

🎉 Thrilled to share my first research project since joining Databricks Mosaic Research: a customizable synthetic data generation engine for evaluating agents! Huge shout out to Daniel Smilkov, Nikhil Thorat for the slick interface and to Jonathan Frankle, Alex Trott for research prowess Check it out ⬇️

thumb_up_off_alt24

chat_bubble_outline0

repeat5

shareShare

Nikhil Thorat

@nsthorat

5 months ago

We just released a blog post on the new GenAI evaluation features in Databricks! This is a project I've been working on with Daniel Smilkov for 6 months. GenAI evaluation is notoriously tricky, especially when developers have to collaborate with domain experts to collect high quality

thumb_up_off_alt14

chat_bubble_outline1

repeat3

shareShare

Daniel Smilkov

Gate.io

LDJ

Teknium (e/λ)

Nikhil Thorat

Jade

Alex Volkov (Thursd/AI)

Daniel Smilkov

qnguyen3

Nikhil Thorat

Databricks

Lilac, joining Databricks!

Daniel Smilkov

Jonathan Frankle

Matei Zaharia

Nikhil Thorat

Daniel Smilkov

Sean Kulinski

Nikhil Thorat