Big Tech Digest #14: Github Copilot tips&tricks, Real-time Fraud Detection at Instacart, Kafka at Walmart, Writing SQL 2x faster with AI at Intuit and more!

2024-04-02 | Big Tech Digest #14

Apr 02, 2024

Happy Tuesday 👋!

This week, I’m presenting the most important and notable articles published since the last Big Tech Digest issue from Meta, Github, Walmart, Instacart, Airbnb, Lyft, and many more!

One thing you could do to help me grow Big Tech Digest is to share it with your teammates 🙏!

Share Big Tech Digest

Without further ado, let’s get started!

// 🏆 Must reads

1. "Using GitHub Copilot in your IDE: Tips, tricks and best practices"

by Kedasha Kerr ⸱ GitHub ⸱ 8 min read ⸱ 25 Mar

Explores the capabilities of GitHub Copilot beyond code completion
Gives tips on providing context for better code suggestions
Introduces tips such as opening relevant files and setting includes/references
Discusses the use of comments and sample code to provide context
Shares tips for using GitHub Copilot Chat effectively

2. "Building Meta’s GenAI Infrastructure"

by Kevin Lee, Adi Gangidi, Mathew Oldham ⸱ Meta ⸱ 7 min read ⸱ 12 Mar

Explores the network fabric solutions for large-scale training clusters
Gives an overview of the Grand Teton in-house-designed GPU hardware platform
Covers the storage deployment and partnership with Hammerspace
Explains the performance optimization and software changes for large clusters

3. "Real-time Fraud Detection with Yoda and ClickHouse"

by Nick Shieh ⸱ Instacart ⸱ 7 min read ⸱ 18 Mar

Explains why ClickHouse was chosen as the real-time datastore
Covers the characteristics of a Fraud Platform
Shares the overview of Yoda’s real-time rule lifecycle
Discusses the integration of Yoda’s feature system with ClickHouse
Describes the data ingestion pipeline for real-time data from PostgreSQL and Instacart event data
Explores the self-serve, config-driven Flink ingestion job for ClickHouse

// 📬 Optional reads

a.k.a. The Best of the Rest!

"Moderating Inappropriate Video Content at Yelp"

by Prateek Yadav ⸱ Yelp ⸱ 4 min read ⸱ 27 Mar

Yelp's Trust and Safety team discusses how they protect users from inappropriate content in videos
The moderation pipeline includes a matching service, deep learning model, and human evaluation
They use a combination of strategies to minimize false positives and efficiently moderate videos

"Introducing Trio | Part I"

by Eli Hart ⸱ Airbnb ⸱ 9 min read ⸱ 28 Mar

Describes how Airbnb developed Trio, an Android framework for Jetpack Compose screen architecture
Explores the challenges of Fragment-based architecture and the motivation behind building Trio
Gives an overview of Trio's architecture, including the use of Trios for building features, standardizing state management, and enabling type-safe navigation and inter-screen communication
Discusses the Trio class, its creation and the standardization of dependencies, and the UI class, its implementation and enforcement of unidirectional data flow
Shares the process of rendering a Trio, including collecting state flow, customizing entry and exit animations, and managing lifecycle and saved state

"How Intuit data analysts write SQL 2x faster with internal GenAI tool"

by Robin Oliva-Kraft ⸱ Intuit ⸱ 6 min read ⸱ 27 Mar

Intuit developed an internal generative AI-powered tool called Query Kickstart to accelerate SQL query authoring
A study with 25 data analysts found that Query Kickstart users were able to write SQL queries 2.2x as fast as those who did not use it
The study showed that Query Kickstart improved productivity with current accuracy and was more helpful for analysts with less experience or shorter Intuit tenure
Participants using Query Kickstart were 2.4–3.5x as fast as the control group when working on unfamiliar data

"Unlocking observability: Structured logging in Spring Boot"

by Mourjo Sen ⸱ Booking.com ⸱ 8 min read ⸱ 26 Mar

Introduces structured logging and its benefits over unstructured logging
Describes the use of structured logging with the companion project Jamboree, built using Spring Boot on Java
Gives an overview of how to add structured logging using SLF4J, including log format, generation, and ingestion into logging infrastructure
Explores the use of MDC for adding contextual information and how to handle logging context in a multi-threaded environment

"Lyft’s Reinforcement Learning Platform"

by Jonas Timmermann ⸱ Lyft ⸱ 12 min read ⸱ 12 Mar

Describes the stages of maturity of applied RL: MAB, CB, and Full-RL
Presents the benefits of RL such as online learning and optimizing for the whole decision-making process
Explains the challenges of using RL, including the lack of mature libraries and guidance on best practices
Shares a demo of a recommendation model using Contextual Bandit
Covers the integration of RL models into existing model training and serving systems, leveraging open-source libraries like Vowpal Wabbit and RLlib

"Building Walmart’s Seamless Communication: Leveraging Kafka’s Custom Partitioning"

by Rajesh Kumar Sahu ⸱ Walmart ⸱ 4 min read ⸱ 13 Mar

Explores the key features of custom partitioning in Kafka and how it can add tremendous value in terms of data distribution and processing.
Describes practical scenarios where custom partitioning offers distinct advantages, such as multi-tenancy, prioritization, throttling, and geographical data.
Gives an overview of the mechanics of implementing custom partitioning in Kafka, including extending the Partitioner interface and implementing partitioning logic.
Shares how custom partitioning can provide a significant performance boost and enable more tailored data distribution, catering to specific needs and overall system efficiency.

"Exploring Kafka Consumer At-Least-Once Delivery Guarantees: Expectations vs. Reality"

by Jon Soul ⸱ Depop ⸱ 8 min read ⸱ 19 Mar

Discusses the different message delivery guarantees provided by Kafka for consumers
Describes the challenges and unexpected consumer behavior observed in a service using Kafka
Explores the impact of consumer service assumptions on message processing
Presents the investigation process into the nature of message duplication and offset skips
Shares the strategies and changes implemented to reduce the impact of rebalances and improve message processing reliability

That’s it for today! I hope you enjoyed this issue. Let me know in the comments what have you learned 💡!

Thanks for reading Big Tech Digest. Please don’t forget to 🗣️ spread the word and see you in two weeks!

Delivered bi-weekly to your inbox, Big Tech Digest brings you a collection of links to the latest engineering blog posts from +300 Big Tech companies and startups like Airbnb, Uber, Netflix or Meta. Aimed at Software Engineers and AI/ML folks at any level, Big Tech Digest is focused on interesting solutions to engineering problems that tech companies come across. No marketing or non-tech stuff.

Subscribe now to receive a new issue directly to your inbox every two weeks!