Freelance Data(bricks) Engineer | #ApacheSpark #DeltaLake #UnityCatalog #Databricks #ApacheKafka #KafkaStreams | Java Champion | @theASF | #DatabricksMVP
ID: 38913594
https://www.linkedin.com/in/jaceklaskowski/ 09-05-2009 19:41:45
25,25K Tweet
6,6K Followers
853 Following



The books have arrived π₯°ππ¦ The pile is exactly how I'm gonna read them (from top to bottom), starting from the one about #ApacheIceberg π§ποΈ Thanks O'Reilly Media for these complimentary copies π



Jane Street has started up our tech talk series after a pandemic-driven hiatus. Our first talk is from Charlie Marsh of Ruff fame, talking about how they made uv, their new package manager for Python, so fast! youtu.be/gSKTfG1GXYQ?siβ¦




Blog post from Xiangpeng Hao explaining the different levels of pruning ApacheDataFusion applies when reading Parquet files: blog.haoxp.xyz/posts/parquet-β¦ The diagrams in particular are π§βπ³π







Correction: GlareDB is moving away from DataFusion! Sean Smith's excellent talk discusses problems with building a DBMS using off-shelf parts. Like DuckDB, the GlareDB rewrite borrows ideas from TUM Database Group's HyPer system but it's written in Rust: youtube.com/watch?v=Sor3KZβ¦

Data Catalogs are getting much-needed attention across #datalakehouse and #datawarehouse as the plot thickens, as they say. We are sharing some of the deep internal research we did to support our multi-catalog sync feature in the Onehouse product in this blog from Kyle Weller .

Vinoth Chandar One thing I did not expect when doing this research was coming to the unfortunate realization that you might need more than one catalog to cover all the bases for a complete data platform solution...



π Weβre proud to announce theΒ Apache Hudi Β 1.0 release! This release has been the result of a massive community effort, with tons of new code (re)written. I want to thank all 60+ contributors who worked on ~180K lines of change. ποΈ Release blog: hudi.apache.org/blog/2024/12/1β¦ Hudi
