Apache Parquet (@apacheparquet) Twitter Tweets • TwiCopy

Apache Parquet

@apacheparquet

+ Follow

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
It provides high performance compression

ID: 1342646282

linkhttps://parquet.apache.org calendar_today10-04-2013 19:07:49

364 Tweet

8,8K Followers

26 Following

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Raniere Silva @[email protected]

@rgaiacs

7 years ago

Last speaker on the #europython's scientific room before lunch is Peter Hoffmann talking about#Pandas and #Dask to work with large datasets in Apache Parquet.

Last speaker on the #europython's scientific room before lunch is Peter Hoffmann talking about#Pandas and #Dask to work with large datasets in <a href="/ApacheParquet/">Apache Parquet</a>.

thumb_up_off_alt34

chat_bubble_outline2

repeat16

shareShare

Gyula Fora

@gyulafora

7 years ago

Gabor Hermann bol Apache Kafka Apache Parquet Apache Flink @bol_com_Techlab Have a look at the Apache Flink bucketing sink rework for the upcoming release and the Parquet writer ;)

thumb_up_off_alt3

chat_bubble_outline3

repeat2

shareShare

Julien Le Dem

@j_

7 years ago

PSA: If you use the page-level statistics in Apache Parquet please chime in on JIRA: issues.apache.org/jira/browse/PA…

thumb_up_off_alt2

chat_bubble_outline0

repeat3

shareShare

I tweeted this ten years ago today. At the time I didn’t quite realize how much impact this little side project would have. To ten years of Parquet! Thanks to all the people who came along for the ride.

thumb_up_off_alt71

chat_bubble_outline1

repeat8

shareShare

Julien Le Dem

@j_

a year ago

It’s happened! The Apache Parquet Java implementation repo I now called parquet-java. Thank you Andrew Lamb for the nudge! This further clarifies that Parquet is used far beyond the Hadoop ecosystem. Maybe whoever created this repo could have thought of this to start with.

thumb_up_off_alt22

chat_bubble_outline0

repeat4

shareShare

Andrew Lamb

@andrewlamb1111

a year ago

To anyone who thinks Apache Parquet is dead, it it showing renewed signs of life 🌹

thumb_up_off_alt36

chat_bubble_outline4

repeat1

shareShare

Andrew Lamb

@andrewlamb1111

a year ago

Turns out Apache Parquet Bloom filters are better than I think many people understand. Trevor Hilton found that for a cost of 2K-8K per row group on high cardinality predicate columns, you can filter all but the exact row group of interest.

thumb_up_off_alt60

chat_bubble_outline0

repeat11

shareShare

Julien Le Dem

@j_

8 years ago

Come hear me talk about ApacheArrow and Apache Parquet at #NABDConf in Palo Alto next Tuesday! x.com/jqcoffey/statu…

thumb_up_off_alt7

chat_bubble_outline1

repeat4

shareShare

Ioannis Athanasiadis

@inathens

8 years ago

At IEEE/ACM UCC/BDCAT today in #Austin presenting our work with Plantbreeding WUR on managing #agri #genomic #bigdata with Apache Spark and Apache Parquet

At <a href="/ucc_bdcat/">IEEE/ACM UCC/BDCAT</a> today in #Austin presenting our work with <a href="/pbr_wur/">Plantbreeding WUR</a> on managing #agri #genomic #bigdata with <a href="/ApacheSpark/">Apache Spark</a> and <a href="/ApacheParquet/">Apache Parquet</a>

thumb_up_off_alt8

chat_bubble_outline1

repeat8

shareShare

Shubham Chaudhary

@ylogx

8 years ago

Working with a 10Gig csv data. Pandas read_csv took 16mins to load the csv into memory. Converted to Apache Parquet with ApacheArrow. It took 30 secs to read into pyarrow table and 16 sec to convert to pandas dataframe. 16mins => 46sec! tech.blue-yonder.com/efficient-data…

Working with a 10Gig csv data. Pandas read_csv took 16mins to load the csv into memory. Converted to <a href="/ApacheParquet/">Apache Parquet</a> with <a href="/ApacheArrow/">ApacheArrow</a>. It took 30 secs to read into pyarrow table and 16 sec to convert to pandas dataframe.

16mins => 46sec!

tech.blue-yonder.com/efficient-data…

thumb_up_off_alt512

chat_bubble_outline13

repeat149

shareShare

Shubham Chaudhary

@ylogx

8 years ago

Apache Parquet ApacheArrow Also the file size went down from 10Gigs to 3Gigs without any compression.

thumb_up_off_alt17

chat_bubble_outline1

repeat6

shareShare

f0nzie@OilGasAnalytics

@fonhzie

8 years ago

I wonder if we have Apache Parquet in #rstats x.com/ylogx/status/9…

thumb_up_off_alt6

chat_bubble_outline1

repeat6

shareShare

Jeeva

@jeeva_g

7 years ago

Is there a way to #sqoop from mssql to #s3 as a parquet directly? #awsemr Apache Parquet Apache Hadoop #bigdata #datalake

thumb_up_off_alt1

chat_bubble_outline3

repeat2

shareShare

Mustafa Akın

@mustafaakin

7 years ago

You do not need Spark to create Apache Parquet files, you can use plain Java and it can even fit in AWS Lambda for a serverless solution: engineering.opsgenie.com/analyzing-aws-…

thumb_up_off_alt14

chat_bubble_outline0

repeat7

shareShare

Julien Le Dem

@j_

7 years ago

If you’re a company using open source projects and not sure how to contribute, a release engineer would be a tremendous help. It’s hard to do this properly part time. I have a specific project in mind, if you need a hint.

thumb_up_off_alt7

chat_bubble_outline0

repeat7

shareShare

lucien fregosi

@lulufrego

7 years ago

Great benchmark between Apache Parquet on #hdfs and Apache Kudu blog.clairvoyantsoft.com/guide-to-using… In short kudu is faster than Parquet for random access Querys like CRUD operations but slower for analytics queries.

thumb_up_off_alt12

chat_bubble_outline0

repeat10

shareShare

Renee Yao

@reneeyao1

7 years ago

Join the #GPU accelerated #analytics and #ML revolution. ApacheArrow Apache Parquet and GPU Open Analytics Initiative #GTC18

Join the #GPU accelerated #analytics and #ML revolution. <a href="/ApacheArrow/">ApacheArrow</a> <a href="/ApacheParquet/">Apache Parquet</a> and <a href="/gpuoai/">GPU Open Analytics Initiative</a> #GTC18

thumb_up_off_alt9

chat_bubble_outline0

repeat8

shareShare

Florian Rathgeber @frathgeber.bsky.social

@frathgeber

7 years ago

2nd #PyDataLDN #keynote - holden karau & Boo (Programmer) walk us through a zoo of #tools for #BigData & #distributed #data in #Python: #Apache #Spark, #PySpark, #Arrow, #Beam, #Parquet & #Dask Apache Spark ApacheArrow Apache Beam Apache Parquet Dask #PyData PyData NumFOCUS

2nd #PyDataLDN #keynote - <a href="/holdenkarau/">holden karau</a> & <a href="/BooProgrammer/">Boo (Programmer)</a> walk us through a zoo of #tools for #BigData & #distributed #data in #Python: #Apache #Spark, #PySpark, #Arrow, #Beam, #Parquet & #Dask
<a href="/ApacheSpark/">Apache Spark</a> <a href="/ApacheArrow/">ApacheArrow</a> <a href="/ApacheBeam/">Apache Beam</a> <a href="/ApacheParquet/">Apache Parquet</a> <a href="/dask_dev/">Dask</a>
#PyData <a href="/PyData/">PyData</a> <a href="/NumFOCUS/">NumFOCUS</a>

thumb_up_off_alt24

chat_bubble_outline0

repeat10

shareShare