
Andrew Lamb
@andrewlamb1111
Apache {DataFusion, Arrow} PMC, Database Engineer
ID: 1326266114805002241
http://andrew.nerdnetworks.org/ 10-11-2020 20:51:18
571 Tweet
2,2K Followers
58 Following




This is so cool -- an example of embedding a special index (a DistinctValues index no less) inside a Apache Parquet file: github.com/apache/datafus⦠(coming in ApacheDataFusion 49.0.0)





New blog post about cooperative scheduling using tokio and Rust async, and how cancellation works in ApacheDataFusion: datafusion.apache.org/blog/2025/06/3ā¦


Weāve adopted ApacheDataFusion in RisingWave for Apache Iceberg compaction service.

I publicly apologize to snapping at Yuchen Liang and Andy Pavlo (@andypavlo.bsky.social) and CMU Database Group . "You need to have a push based scheduler to do ..." TUM / DuckDB(CWI) created group-think in Databases where push schedulers are required, ClickHouse, Spark, DataFusion, etc not withstanding š¤¦

Quite a list of contributors already to the Rust Apache Parquet implementation of Variant (support for semi structured data). I was making some slides to explain what Variant is and made up a list I wanted to share. The feature will be amazing github.com/apache/arrow-rā¦


Example of the kind of low level obsession we foster in the Rust parquet / arrow / DataFusion community: github.com/apache/arrow-r⦠I am skeptical that proprietary engines will be able to compete with OSS long term (though I am biased) Huge thanks to Qi Zhu and Daniël Heres @[email protected]

From what I can see, commercial open-source software keeps pulling ahead of closed-source alternatives. Trust is the primary driver - technical excellence comes second. Iāve even seen companies pick a product solely because itās written in Rust Language.

I am speaking at the #ApacheIceberg NYC MeetupĀ onĀ July 10th about Variant in Apache Parquet which enable more efficient of processing semi structured data such as that found in JSON. lu.ma/95a5qys1


Speaking of Apache Parquet performance obsession, Jƶrn Horstmann just dropped this š£ for faster writing github.com/apache/arrow-rā¦

Sweet VLDB paper from TUM (Mateusz Gienieczko / github.com/v0ldek) proposing extending Apache Parquet using user defined encodings (via WASM). Favorite image shows the ease of integrating into ApacheDataFusion gienieczko.com/anyblox-paper
