Big Tech Digest #15 💥: Introducing Meta Llama 3 at Meta, Scaling to Count Billions at Canva, Adopting Airflow at Booking.com and more!

Featuring articles from Meta, Netflix, Doordash, Booking, and many more!

May 01, 2024

Happy Wednesday 👋!

This week, I’m presenting the most important and notable articles published since the last Big Tech Digest issue from Meta, Netflix, Doordash, Booking, and many more!

There’s just one thing you could do to help me grow Big Tech Digest: go ahead and mention it to your friends and/or teammates. Thank you! 🙏

Share Big Tech Digest

Without further ado, let’s get started!

// 🏆 Must reads

1. "Introducing Meta Llama 3: The most capable openly available LLM to date"

Meta ⸱ 10 min read ⸱ 18 Apr

Discusses the performance with new 8B and 70B parameter models
Explains the model architecture, pretraining data, scaling up pretraining, and instruction fine-tuning
Describes the deployment of Llama 3 at scale and its availability on all major platforms: AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure and more

2. "Scaling to Count Billions"

by Sangzhuoyang Yu ⸱ Canva ⸱ 11 min read ⸱ 12 Apr

Introduces the latest architecture using OLAP database
Covers the core tracking functionality as a counting pipeline
Goes through the evolvement of the architecture, starting with MySQL and the challenges faced
Presents the migration of data to DynamoDB and the decision not to proceed
Shares the simplification using OLAP and ELT, and the improvements and challenges faced
Shares key lessons learned about designing reliable services and introducing architectural changes

// 📬 Optional reads

a.k.a. The Best of the Rest!

"The Making of VES: the Cosmos Microservice for Netflix Video Encoding"

Netflix ⸱ 10 min read ⸱ 09 Apr

Gives an overview of the three layers of a Cosmos microservice: API layer (Optimus), workflow layer (Plato), and computing layer (Stratum)
Shares lessons learned, including defining a proper service scope, being pragmatic about data modeling, and embracing service API changes
Explores the continuous release process for VES, emphasizing a short release cycle and automated deployment

"Building DoorDash’s Product Knowledge Graph with Large Language Models"

by Steven Xu and Sree Chaitanya Vadrevu ⸱ DoorDash ⸱ 7 min read ⸱ 23 Apr

Discusses the challenges of standardizing and enriching raw merchant data for DoorDash's retail catalog
Describes the use of Large Language Models (LLMs) to extract product attributes from unstructured SKU data
Presents the use of LLMs in brand extraction, organic product labeling, and generalized attribute extraction
Covers the downstream impacts of attribute extraction on improving customer shopping experience
Explores future plans to use multimodal LLMs for attribute extraction and democratize their use across DoorDash through a centralized model platform

"Lessons in adopting Airflow on Google Cloud"

by Parin Porecha ⸱ Booking ⸱ 7 min read ⸱ 24 Apr

Describes the process of migrating workflows to Airflow on GCP through their Composer offering
Explores the creation of a local Airflow environment to mimic the remote production environment
Gives an overview of performance tuning with regards to celery.worker_concurrency and other parameters
Presents the use of Dataproc for heavy lifting and cost reduction
Shares the use of embedded documentation in DAGs and service account impersonation for security

"Transforming Recommendations at ASOS"

by Ed Harris ⸱ ASOS ⸱ 7 min read ⸱ 24 Apr

Introduces the use of transformer technology for fashion recommender systems
Explains how transformers utilize self-attention and positional awareness to capture customer style and interactions
Shares the development of a transformer recommendations system at ASOS using the Transformers4Rec library
Describes the superior performance of the new transformer model over the previous model, leading to a 20% increase in the evaluation metric

"Unveiling the Essence of Code-Level Extensibility in System Design"

by Keshavpeswani ⸱ Expedia ⸱ 4 min read ⸱ 23 Apr

"Reverse Searching Netflix’s Federated Graph"

Netflix ⸱ 7 min read ⸱ 04 Apr

Describes the development of reverse search functionality within Netflix's federated graph
Explains the use of percolator fields in Elasticsearch to enable reverse searching
Covers the implementation of percolate indexing pipeline for SavedSearches
Presents the use of reverse search for movie classification and workflow assignment
Introduces the potential future use of reverse search for creating responsive UIs and GraphQL subscriptions

"KubeCon EU 2024 Tech Trends: GitOps, AI Hype, Debuggability & More"

by Jason Johl ⸱ Intuit ⸱ 5 min read ⸱ 01 Apr

Discusses the expansion of GitOps and Service Mesh in celebration of Kubernetes turning 10 years old
Describes the huge growth of community-driven projects like Backstage, KCL, and Argo, highlighting the importance of community in open source
Explores the continued focus on AI in software applications, with a particular emphasis on the role of Kubernetes in the AI revolution

Thanks for reading Big Tech Digest. If you enjoyed this issue, 🔗 share it with your friends or teammates.

See you in two weeks 👋!