Subsetting is a well established technique that helps scale a distributed system, once the management overhead of a fully connected mesh is no longer viable. At Datadog, with hundreds of thousands of processes connected with gRPC, we have been suffering from this overhead for quite some time, and around one year ago we decided to do something about it. As, currently, there isn’t a native way in gRPC to do subsetting, we decided to start extending gRPC in-house to fit our case. This presentation is going to be a story about the many different ways we attempted to deal with subsetting in gRPC and lessons we learned. We will cover why subsetting is beneficial, the different implementations we tried, ways to eliminate imbalance generated by subsetting and how subsetting is helping us to use smart load balancing algorithms to manage overhead and drive reliability. We will finish by providing an update around our efforts to upstream these changes following the gRPC RFC process.
Thursday July 17, 2025 12:05pm - 12:25pm PDT Coast Live Oak