
Wes McKinney
@wesmckinn
Principal Architect @posit_pbc, GP @ComposedVC, Co-founder @voltrondata. OSS: @ApacheArrow @pandas_dev @IbisData, "Python for Data Analysis" book
ID: 115494880
https://wesmckinney.com 18-02-2010 21:01:15
8,8K Tweet
57,57K Followers
890 Following



Insightful post on why Apache Iceberg may not be a one-size-fits-all solution when it comes to a table format to manage large multimodal ML/AI datasets

I've been using a data science agent called Vincent for the past few months and really like it! It works natively with Jupyter notebooks in VSCode: marketplace.visualstudio.com/items?itemName… Write a prompt and creates a first draft of the notebook. Data science use cases are narrow enough that it



1/ We just raised $17M to build the multimodal data stack for Physical AI! 🚀 Lead: Point Nine 🇺🇦 With: @CostanoaVC, Sunflower Capital, seedcamp Angels including: Guillermo Rauch, Eric Jang, Oliver Cameron, Wes McKinney , Nicolas Dessaigne , Arnav Bimbhet Thesis: rerun.io/blog/physical-…


The lakehouse paradigm represents a radical transformation in data architectures, welcoming in an era of unprecedented interoperability. The next wave of multi-billion-dollar infrastructure giants are here ⤵️ Read on from Janelle Teng & Lauri Moore: bvp.com/atlas/roadmap-…


Take the ferry to Data Council, but beware the DATA KRAKEN. Open water. No traffic. Just Wi-Fi, a full bar and a smooth ride. p.s. Your Clipper Card works on the ferry. Add to your Apple Wallet. p.p.s. Blue Bottle Coffee at the Ferry Building opens at 6:30am. 📅 April 22-24 |


I’m excited about xorq! Ibis and DataFusion brought together to orchestrate multi-engine data pipelines, all powered by ApacheArrow github.com/xorq-labs/xorq

xorq - An exciting tool in Modern Data Engineering, built on top of Ibis, Datafusion and technically ApacheArrow xorq was developed to give Python developers a more ergonomic way to build, cache, and serve pipelines—without getting locked into a single engine. 1. Simplifying

Worlds Fastest TPCH Data Generator, courtesy of ApacheDataFusion 's community. Scale Factor 100 in under 2 minutes on Macbook air. Open Source, no dependency Rust. Thanks to CMU Database Group and Wan Shen Lim (@wslim.bsky.social) for the inspiration datafusion.apache.org/blog/2025/04/1… youtube.com/watch?v=UYIC57…





😍 > To the ApacheDataFusion Community: The intermediate representation of the SQL compiler is the DataFusion logical plan which has proven to be pragmatic, extensible, and easy to work with in all the right ways. github.com/dbt-labs/dbt-f…


Project from someone at Apple about building an distributed in memory cache using ApacheDataFusion LinkedIn: linkedin.com/posts/andrey-v… Design: docs.google.com/document/d/1xj…


Nice bumping into two data science legends (Hadley Wickham and Wes McKinney) at the Databricks conference on the 18th birthday of ggplot2
