Big Tech Digest #12: MySQL migration at Flipkart, Java 21 Key Features at Capital One, 10 Github Security Best Practices at Snyk, Cassandra Tuning at DoorDash and more!

2024-02-09 | Big Tech Digest #12

Feb 09, 2024

Happy Friday 👋!

This week, I’m presenting 7 notable articles published since the last Big Tech Digest issue from Meta, Snyk, Flipkart, Capital One, and DoorDash. I hope you enjoy it!

One thing you could do to help me grow Big Tech Digest is to share it with connections.

Any share is much appreciated 🙏!

Share Big Tech Digest

Without further ado, let’s get started!

// 🏆 Top 3 must-reads

1. "The Great Migration, MySQL Edition"

by Hazarath Reddy Nukaraju ⸱ Flipkart ⸱ 8 min read ⸱ 6 Feb

Explores the challenges and requirements for migrating MySQL clusters to a new data center
Gives an overview of the hardware performance benchmarking process
Covers the process of upgrading MySQL version 5.6 clusters to version 5.7
Shares the activities involved in migrating clusters to the new data center
Explains the strategy for mitigating challenges and reducing application team involvement
Goes through the cutover process and the option to fallback to the old data center

2. "Java 21: Key Features and Improvements"

Capital One ⸱ 13 min read ⸱ 30 Jan

Gives an overview of Java vs. Scala vs. Kotlin
Introduces the impact of Java 10’s ‘var’ keyword on verbosity
Goes through overcoming type checking challenges with instanceof and pattern matching
Describes how Java records reshape data classes
and more!

3. "10 GitHub Security Best Practices"

by Brian Vermeer ⸱ Snyk ⸱ 13 min read ⸱ 05 Feb

Describes the importance of enabling and enforcing 2FA for GitHub
Covers the best practices for limiting access to repositories
Explains the risk of storing credentials as code/config in GitHub
Shares the importance of connecting repositories to Snyk and scanning for vulnerabilities
Provides an overview of branch protection rules and how to set them up
Discusses the significance of rotating SSH tokens and personal keys
Explores the importance of automatically updating dependencies
Introduces the use of private repositories for sensitive data

// 📬 Optional reads

a.k.a. The Best of the Rest!

"Data lake vs data warehouse: Comparison"

Capital One ⸱ 10 min read ⸱ 30 Jan

Presents the use cases for data lakes vs data warehouses
Explores the schema of data lakes vs data warehouses
Describes the process of processing data in data lakes and data warehouses
Covers the benefits and challenges of data lakes and data warehouses

"Automated Backup Restore Validation"

by Isha Aggarwal ⸱ Flipkart ⸱ 6 min read ⸱ 05 Feb

Describes the architecture and design of the Automated Restore Validation process
Explores the challenges and solutions in optimizing key parameters for the validation process
Presents the future roadmap for extending the validation to other datastores and implementing data certification
Shares insights into the critical role of BRaaS in guaranteeing the functionality and restorability of backups for various MySQL clusters within Flipkart

"Improving machine learning iteration speed with faster application build and packaging"

by Barys Skarabahaty, Stanislau Hlebik ⸱ Meta ⸱ 5 min read ⸱ 29 Jan

Discusses how addressing build times and inefficiencies in packaging and distributing execution files in ML/AI development led to double-digit overhead reduction
Describes efforts to streamline the build graph, mitigate build non-determinism, and introduce incrementality support for packaging and distribution
Explores the sources of build non-determinism and the implementation of non-determinism mitigation within Remote Execution (RE)
Shares the implementation of the Content Addressable Filesystem (CAF) for packaging and fetching of Python executables, and the use of Btrfs as the filesystem
Goes through further ML iteration improvements, including optimizing executable parts on demand and enforcing uniform revisions for improved cache hit ratios

"Cassandra Unleashed: How We Enhanced Cassandra Fleet’s Efficiency and Performance"

by Seed Zeng ⸱ DoorDash ⸱ 17 min read ⸱ 30 Jan

Covers the importance of designing an effective Cassandra schema for performance and scalability
Explores the impact of consistency levels on performance and data accuracy in Cassandra
Describes the benefits of tuning garbage collection in Cassandra for improved throughput and latency

That’s it for today! I hope you enjoyed this issue. Let me know in the comments what have you learned 💡!

Thanks for reading Big Tech Digest. Please don’t forget to 🗣️ spread the word and see you in two weeks!

Delivered bi-weekly to your inbox, Big Tech Digest brings you a collection of links to the latest engineering blog posts from +300 Big Tech companies and startups like Airbnb, Uber, Netflix or Meta. Aimed at Software Engineers and AI/ML folks at any level, Big Tech Digest is focused on interesting solutions to engineering problems that tech companies come across. No marketing or non-tech stuff.

Subscribe now to receive a new issue directly to your inbox every two weeks!

Big Tech Digest