DataFusion Belgrade Meetup 2024/09/27 #11431

gruuya · 2024-07-12T12:57:45Z

gruuya
Jul 12, 2024

Hi all,

I'm pleased to announce a (first?) European DataFusion meetup, in Belgrade, Serbia. Some details:

Date: Friday September 27th, 2024
Time: 17:00-20:30 (CET)
Location: Ušće Tower 2, Bulevar Mihajla Pupina 4 (11th floor @ Microsoft Development Center Serbia)
Format: 15-min talks, with free-form discussion before and after

A bit more info here: https://docs.google.com/document/d/1wlWKFRQocLGL7Rhu3BiI8geIWsowd-ZXVozIczZqrqQ/edit

alamb · 2024-07-12T17:07:46Z

alamb
Jul 12, 2024
Collaborator

Awesome -- I plan on attending!

0 replies

alamb · 2024-07-24T23:34:14Z

alamb
Jul 24, 2024
Collaborator

Here is the signup link: https://lu.ma/tmwuz4lg

0 replies

alamb · 2024-09-30T11:07:53Z

alamb
Sep 30, 2024
Collaborator

Thanks again @gruuya for making this happen

2 replies

alamb Oct 2, 2024
Collaborator

Here are the slide decks I have collected

Andrew Lamb, InfluxData, Staff Engineer

DataFusion: What, Why, How
https://docs.google.com/presentation/d/1zFh-ayH922k9Rvz2lZxYzjfoemKfr8mRLpw8BLHdw7k/edit#slide=id.g26bebde4fcc_3_7

Artjoms Iskovs, EnterpriseDB, Principal Engineer @mildbyte
Reducing query latency in DataFusion via a caching object store layer
https://docs.google.com/presentation/d/1TiToVb5rVFrmuR9Dxej7HgWpyv0p_88Ise3CyQKZSzE/edit#slide=id.p1

Mehmet Ozan Kabak, Synnada CEO @ozankabak (I don't seem to have these slides)

Marko Grujic, EnterpriseDB @gruuya

Database replication using the FDAP* stack
https://docs.google.com/presentation/d/1hp0lRIwG8wpRlPMtdx-BPxXU3L2vqCBuPRkTFDgHoHo/edit#slide=id.p1

Nick Karlov, Tarantool (not sure of the github username)

Piotr Findeisen, SDF @findepi

"The Types"
https://docs.google.com/presentation/d/1VW_JCGbN22lrGUOMRvUXGpAmlJopbG02hn_SDYJouiY

SamSynnada Oct 3, 2024

Synnada presentation: https://docs.google.com/presentation/d/1i7l7bslZp3rRx0_S9ejFTC5ChyvS2bYS

SamSynnada · 2024-10-03T13:18:03Z

SamSynnada
Oct 3, 2024

Recap of the event

We successfully wrapped up the first-ever Apache DataFusion Meetup in Europe on September 27, 2024, marking a significant milestone for the community. The initial idea for the event came from @alamb on [May 2, 2024](#10342), and shortly after, @gruuya took full responsibility for bringing it to life. From coordinating speakers to handling logistics, @gruuya ensured everything ran smoothly, [drafted the event details by June 12, 2024](#11431).

@gruuya's dedication was truly remarkable, as he personally ensured everything ran smoothly for the participants—from picking speakers up at the airport to driving them to their hotels and making sure they were well taken care of throughout the event. His hands-on approach and commitment played a key role in making the meetup run seamlessly and creating a memorable experience for everyone. A special thanks to @gruuya for his tireless efforts in bringing this event to life.

This was a major moment for the DataFusion community in Europe, bringing together leading figures from the project to share their knowledge and advancements with enthusiasts. The energy in the room and the exchange of ideas truly demonstrated the vibrancy and growth of the data ecosystem in Belgrade which has a solid future ahead. Throughout the day, a series of compelling talks covered a wide range of topics, from the core principles of DataFusion to cutting-edge innovations and real-world applications.

Venue & Participation

Despite being the first-ever Apache DataFusion Meetup in Europe, the event was a great success with nearly 70 attendees.
Microsoft generously provided their office in Belgrade for the event, and it proved to be a truly excellent space with an amazing view! Catering was also excellent, which provided by EDB, offering a variety of local delicacies.

People started gathering around 5pm, with the talks beginning at 6pm, allowing ample time for attendees to mingle and connect before the presentations started.

Talks

The talks kicked off with @alamb, who provided an in-depth introduction to origins and goals of Apache DataFusion. He started by described DataFusion as LLVM for data systems, enabling innovation in data-intensive systems. @alamb highlighted DataFusion’s architecture, built with industrial best practices, and its ability to compete with tightly integrated systems. Finally, @alamb touched on the Rust-based implementation and ongoing optimizations that ensure DataFusion remains highly performant, especially in multi-core environments.

Next, @mildbyte, Principal Engineer at EDB, delivered a highly technical talk on caching optimization using DataFusion in EDB. @mildbyte explained how EDB utilizes DataFusion to optimize query caching, which leads to significant performance improvements.These optimizations are crucial for managing large-scale data systems, showcasing how EDB leverages DataFusion’s capabilities effectively.

@ozankabak, co-founder and CEO of Synnada, spoke about the challenges of building data-intensive applications, referring to the Data Chasm — a complex landscape with many moving parts that makes it difficult to manage data efficiently. He explained how DataFusion helps break down these barriers, allowing for a more streamlined approach to data processing. @ozankabak highlighted Synnada's contributions to the DataFusion project, including their work on a unified data processing, which builds on top of DataFusion to simplify data workflows.

@gruuya, senior staff engineer at EDB, the hero of the day who gathered all of us for this amazing event, gave a talk focused on database replication using the FDAP (Flight, DataFusion, Arrow, and Parquet) stack. @gruuya explained how this powerful combination of open-source tools enables efficient and scalable data replication, particularly in analytic environments. By leveraging Apache Arrow for in-memory data processing and Flight for fast network data transfer, the FDAP stack ensures low-latency communication between distributed databases. DataFusion handles real-time query execution across replicated data, while Parquet optimizes storage and performance, making this stack a highly efficient solution for large-scale database replication.

@karlovnv from Tarantool followed, sharing insights on how his team is pushing the limits of big data. @karlovnv showcased their work on real massive datasets, such as handling 3,000-column dataset, processing 70TB of data in RAM and doing these things really really fast (quicker than 10ms for a fraud detection use case!?). His talk demonstrating how DataFusion plays a key role in enabling these high-performance workloads.

@findepi from SDF wrapped up the talks with a detailed exploration of types and functions in the context of Apache Arrow vs DataFusion. He explained how types are handled in Arrow and DataFusion. @findepi's insights shed light on the potential improvements that could further enhance DataFusion’s handling of data types.

After the talks, we headed to Docker (accompanied with container jokes), where the conversation continued in a more relaxed setting. It was a great way to unwind and keep sharing ideas. The success of the meetup made it clear—we should do this again to exchange more war stories and insights.

Closing remarks

Even though it was first time, the Apache DataFusion Belgrade Meetup turned out to be a great success!

@gruuya took the helm on behalf of the analytics team at EDB, and organized everything flawlessly—thank you once again for making this meetup possible.
Microsoft provided the space, logistics (including recording, which will be released soon), EDB provided the amazing food and drinks.
The speakers delivered a well-curated agenda, offering a range of valuable insights.
Nearly 70 attendees participated, most from the data ecosystem, including data engineers and those building data-intensive applications.

What could be improved?

The event was fantastic—these are just suggestions to make future meetups even more engaging:
- More merch/goodies like DataFusion stickers and t-shirts to bring home.
- Name tags would help break the ice and make networking easier.
- More photos and content to share the experience. We took some, you can access them via this link
- Some CTAs for the community, like joining the Discord, trying a first issue, or a walkthrough on the community.
- Broader representation of projects being built using DataFusion.

1 reply

alamb Oct 11, 2024
Collaborator

The recordings are now available on youtube:

https://youtube.com/playlist?list=PLrhIfEjaw9ilQEczOQlHyMznabtVRptyX&si=mzTM6e_oZTFhSM_U

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFusion Belgrade Meetup 2024/09/27 #11431

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

DataFusion Belgrade Meetup 2024/09/27 #11431

gruuya Jul 12, 2024

Replies: 4 comments · 3 replies

alamb Jul 12, 2024 Collaborator

alamb Jul 24, 2024 Collaborator

alamb Sep 30, 2024 Collaborator

alamb Oct 2, 2024 Collaborator

SamSynnada Oct 3, 2024

SamSynnada Oct 3, 2024

Recap of the event

Venue & Participation

Talks

Closing remarks

What could be improved?

alamb Oct 11, 2024 Collaborator

gruuya
Jul 12, 2024

Replies: 4 comments 3 replies

alamb
Jul 12, 2024
Collaborator

alamb
Jul 24, 2024
Collaborator

alamb
Sep 30, 2024
Collaborator

alamb Oct 2, 2024
Collaborator

SamSynnada
Oct 3, 2024

alamb Oct 11, 2024
Collaborator