Wes McKinney (@wesmckinn) 's Twitter Profile
Wes McKinney

@wesmckinn

Principal Architect @posit_pbc, GP @ComposedVC, Co-founder @voltrondata. OSS: @ApacheArrow @pandas_dev @IbisData, "Python for Data Analysis" book

ID: 115494880

linkhttps://wesmckinney.com calendar_today18-02-2010 21:01:15

8,8K Tweet

57,57K Followers

890 Following

DuckDB (@duckdb) 's Twitter Profile Photo

New blog post: Query Engines: Gatekeepers of the Parquet File Format In this post, Laurens Kuiper argues that we are wasting a lot of bits by not using the Parquet format to its full extent – a limitation caused by the lack of support for Parquet features in some systems.

New blog post:

Query Engines: Gatekeepers of the Parquet File Format

In this post, Laurens Kuiper argues that we are wasting a lot of bits by not using the Parquet format to its full extent – a limitation caused by the lack of support for Parquet features in some systems.
Wes McKinney (@wesmckinn) 's Twitter Profile Photo

Insightful post on why Apache Iceberg may not be a one-size-fits-all solution when it comes to a table format to manage large multimodal ML/AI datasets

Anthony Goldbloom (@antgoldbloom) 's Twitter Profile Photo

I've been using a data science agent called Vincent for the past few months and really like it! It works natively with Jupyter notebooks in VSCode: marketplace.visualstudio.com/items?itemName… Write a prompt and creates a first draft of the notebook. Data science use cases are narrow enough that it

Neon - Serverless Postgres (@neondatabase) 's Twitter Profile Photo

We’ve partnered with ParadeDB to bring pg_search to all Neon databases. 💥 This extension delivers Elasticsearch-grade full text search without leaving Postgres. Benchmark results here 👇, summary in 🧵 neon.tech/blog/pgsearch-…

Akshay Agrawal (@akshaykagrawal) 's Twitter Profile Photo

I've spent the past 3 years working with myles and Dylan Madisetti to fix Python notebooks — version with Git, run as scripts, reuse as modules. Why marimo stores notebooks as Python, not JSON: marimo.io/blog/python-no…

Rerun (@rerundotio) 's Twitter Profile Photo

1/ We just raised $17M to build the multimodal data stack for Physical AI! 🚀 Lead: Point Nine 🇺🇦 With: @CostanoaVC, Sunflower Capital, seedcamp Angels including: Guillermo Rauch, Eric Jang, Oliver Cameron, Wes McKinney , Nicolas Dessaigne , Arnav Bimbhet Thesis: rerun.io/blog/physical-…

Steve Yegge (@steve_yegge) 's Twitter Profile Photo

Hi all, I just dropped a new blog post: sourcegraph.com/blog/revenge-o… This one's a beehive-kicker for sure. Hope you like it and find it enlightening, even if you don't agree with all of it.

Bessemer (@bessemervp) 's Twitter Profile Photo

The lakehouse paradigm represents a radical transformation in data architectures, welcoming in an era of unprecedented interoperability. The next wave of multi-billion-dollar infrastructure giants are here ⤵️ Read on from Janelle Teng & Lauri Moore: bvp.com/atlas/roadmap-…

The lakehouse paradigm represents a radical transformation in data architectures, welcoming in an era of unprecedented interoperability. 

The next wave of multi-billion-dollar infrastructure giants are here ⤵️

Read on from <a href="/NextBigTeng/">Janelle Teng</a> &amp; <a href="/laurijmoore/">Lauri Moore</a>: bvp.com/atlas/roadmap-…
Pete Soderling (@petesoder) 's Twitter Profile Photo

Take the ferry to Data Council, but beware the DATA KRAKEN. Open water. No traffic. Just Wi-Fi, a full bar and a smooth ride. p.s. Your Clipper Card works on the ferry. Add to your Apple Wallet. p.p.s. Blue Bottle Coffee at the Ferry Building opens at 6:30am. 📅 April 22-24 |

Take the ferry to <a href="/DataCouncilAI/">Data Council</a>, but beware the DATA KRAKEN.

Open water. No traffic. Just Wi-Fi, a full bar and a smooth ride.

p.s. Your Clipper Card works on the ferry. Add to your Apple Wallet.
p.p.s. <a href="/bluebottleroast/">Blue Bottle Coffee</a> at the Ferry Building opens at 6:30am.

📅 April 22-24 |
Wes McKinney (@wesmckinn) 's Twitter Profile Photo

I’m excited about xorq! Ibis and DataFusion brought together to orchestrate multi-engine data pipelines, all powered by ApacheArrow github.com/xorq-labs/xorq

ABC (@ubunta) 's Twitter Profile Photo

xorq - An exciting tool in Modern Data Engineering, built on top of Ibis, Datafusion and technically ApacheArrow xorq was developed to give Python developers a more ergonomic way to build, cache, and serve pipelines—without getting locked into a single engine. 1. Simplifying

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

Worlds Fastest TPCH Data Generator, courtesy of ApacheDataFusion 's community. Scale Factor 100 in under 2 minutes on Macbook air. Open Source, no dependency Rust. Thanks to CMU Database Group and Wan Shen Lim (@wslim.bsky.social) for the inspiration datafusion.apache.org/blog/2025/04/1… youtube.com/watch?v=UYIC57…

Worlds Fastest TPCH Data Generator, courtesy of <a href="/ApacheDataFusio/">ApacheDataFusion</a> 's community. 

Scale Factor 100 in under 2 minutes on Macbook air. 

Open Source, no dependency Rust. 

Thanks to  <a href="/CMUDB/">CMU Database Group</a> and <a href="/lmwnshn/">Wan Shen Lim (@wslim.bsky.social)</a> for the inspiration

datafusion.apache.org/blog/2025/04/1…
youtube.com/watch?v=UYIC57…
Bauplan (@bauplan_labs) 's Twitter Profile Photo

🚀 Introducing Bauplan A serverless, code-native platform for building data and AI pipelines — directly on your object store. No clusters. No notebooks. No GUI based workflows. Just Python + SQL + S3. 👉 bauplanlabs.com/blog/hello-bau…

🚀 Introducing Bauplan

A serverless, code-native platform for building data and AI pipelines — directly on your object store. No clusters. No notebooks. No GUI based workflows.

Just Python + SQL + S3.

👉 bauplanlabs.com/blog/hello-bau…
Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

20x faster TPCH data generator availably via pip install: pip install tpchgen-cli Blog from Kevin Liu: kevinjqliu.github.io/blog/posts/tpc…

CedarDB (@cedar_db) 's Twitter Profile Photo

CedarDB Community Edition is here! Download CedarDB Community Edition today - no paywall, no signup, just pure performance. Read more about our CedarDB on our blog: cedardb.com/blog/launch/

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

😍 > To the ApacheDataFusion Community: The intermediate representation of the SQL compiler is the DataFusion logical plan which has proven to be pragmatic, extensible, and easy to work with in all the right ways. github.com/dbt-labs/dbt-f…

Wes McKinney (@wesmckinn) 's Twitter Profile Photo

With last week's DuckLake announcement and prior explorations of a DuckDB-powered data lake such as "DuckHouse" (Flight + DuckDB using xorq), we are heading in some interesting directions: juhache.substack.com/p/from-duckdb-…

Andrew Lamb (@andrewlamb1111) 's Twitter Profile Photo

Project from someone at Apple about building an distributed in memory cache using ApacheDataFusion LinkedIn: linkedin.com/posts/andrey-v… Design: docs.google.com/document/d/1xj…

Project from someone at Apple about building an distributed in memory cache using <a href="/ApacheDataFusio/">ApacheDataFusion</a>  

LinkedIn: linkedin.com/posts/andrey-v…

Design: docs.google.com/document/d/1xj…