What Truly Limits System Scalability and How Engineers Balance Speed and Stability

Scalability has long been viewed as a purely technical challenge — a matter of architecture, infrastructure, and optimization.

Scalability has long been viewed as a purely technical challenge — a matter of architecture, infrastructure, and optimization.

Yet as digital products expand to serve millions of users, it’s becoming clear that the real limits of scalability are not purely technological. They often stem from architectural decisions, engineering culture, and the human trade-offs that shape every system’s design.

According to Google’s Site Reliability Engineering report (2022) and McKinsey’s research on technology resilience, over 60% of major system failures occur not because of resource shortages but because of complexity — unpredictable dependencies and hidden interactions within distributed architectures. Meanwhile, companies like Netflix and Meta emphasize the importance of graceful degradation: building systems that adapt under stress instead of collapsing under it.

In this interview, Sergey Sidorov, software engineer and technical leader at Meta, and an expert in large-scale infrastructure and reliability systems, shares an analytical perspective on what truly limits system scalability. He explains how engineers strike the balance between speed and stability — and why architectural simplicity remains the foundation of resilience in a world where complexity grows faster than performance.

When systems begin to grow, what actually limits scalability — technology, architecture, or people?

Technology itself is rarely the main limit to scalability. More often, the real constraint is people and how the work is organized. This is where Conway’s law comes into play: a system’s architecture inevitably mirrors the structure of the team that builds it. If processes are fragmented and communication is complicated, the system will scale in the same fragmented and inefficient way.

As the business grows, the key challenge becomes enabling teams to move in sync and make decisions without excessive coordination. Scalability is determined not by team size, but by architectural clarity and how effectively people can collaborate. We see this in examples like Telegram and early WhatsApp, which supported massive user bases with remarkably small engineering teams.

Was there a moment in your career when a system unexpectedly hit its limit?

First, it’s important to clarify what we mean by a system’s limit. Typically, this refers to the maximum volume of operations or requests a system can handle — whether it’s the number of concurrent video viewers or the amount of background computation it can sustain. In this context, scalability describes the system’s ability to serve more load in proportion to the resources added.

In practice, this is where a paradox emerges: adding resources does not guarantee improved scalability. In some cases, it provides no benefit at all.

At Facebook, I worked on scaling one of the key components of the internal infrastructure — a large distributed job queue responsible for absorbing traffic spikes and off-loading heavy or delayed workloads. The original architecture mirrored the organizational structure of early Facebook: components communicated on an “everyone talks to everyone” basis. At a certain point, communication overhead grew so large that adding more servers no longer improved performance.

Only by rethinking the interaction model and redesigning the architecture were we able to continue scaling — and ultimately operate the system with fewer resources than before.

Why do companies often overvalue speed and undervalue stability?

This imbalance is largely driven by the economic realities of modern technology companies. Startups operate with short strategic horizons: when the company’s survival depends on the next funding round, long-term investments in reliability can feel like a luxury. The logic is straightforward — if you might not make it to the next quarter, stability naturally falls down the priority list.

There’s also the competitive dynamic of today’s tech market. We are in a period of rapid digital expansion, where most players are young and fast-moving. In this environment, speed becomes a tool for capturing market share. Users may tolerate occasional issues, but they are far less forgiving when products ship slowly. As a result, companies treat delivery velocity as a primary driver of competitiveness.

Perception plays a role as well. Speed is highly visible: release frequency, feature delivery rates, growth metrics. Stability, on the other hand, only becomes noticeable in moments of failure. When everything “just works,” investments in reliability appear invisible and offer no immediate payoff — which naturally pushes them lower on the priority ladder.

It’s important to note that this approach isn’t inherently wrong. In the early stages, companies do have to make trade-offs between quality and time-to-market. But over time, it becomes clear that stability is not the opposite of speed — it is a prerequisite for sustaining it. You can move fast only when the underlying architecture can absorb the consequences of that speed.

Which architectural principles help scale systems without losing control or predictability?

One of the core principles is to keep the architecture as compact and manageable as possible. Adding complexity does not inherently make a system more scalable; in many cases, it introduces overhead that is difficult to anticipate and eventually becomes a source of unpredictability.

A good illustration is Stack Overflow — one of the most heavily used platforms among software engineers. At its peak, it operated on a minimal number of servers, without sprawling microservice landscapes or geographically distributed shards. Its architecture prioritized clarity, predictability, and operational efficiency.

The reality is that most products can run successfully on simple infrastructure — in some cases, even on a single server with a hot standby. Only organizations operating at the scale of Google, Netflix, or Meta genuinely require highly distributed, complex architectures. For the majority of companies, overengineering introduces more long-term risks than benefits.

What most often prevents engineers from spotting scalability limits early?

The primary obstacle is insufficient observability. In many organizations, monitoring is designed mainly around incident response: metrics focus on detecting and resolving immediate production issues. This approach addresses short-term problems but offers little insight into long-term system behavior.

As a result, engineers may notice individual symptoms but miss the underlying trends — slow regressions, gradual increases in latency, or declining algorithmic efficiency. These issues unfold over weeks or months, not minutes, and are therefore invisible to purely reactive monitoring.

This is where SLOs become essential. They establish quality targets measured over extended periods and help surface trends that traditional monitoring cannot reveal. In practice, SLOs allow teams to identify early signs of degradation long before they escalate into full-scale incidents.

Looking ahead, what principles of scalability will define successful systems in the next few years?

In the coming years, the key advantage won’t come from mastering a particular technology stack, but from an engineering team’s ability to understand how their systems behave in real-world conditions. The growing complexity of distributed architectures makes it increasingly difficult to rely solely on established patterns — they age quickly and rarely fit the unique characteristics of a specific product.

The first principle is engineering curiosity and critical thinking. Modern systems are too complex to operate on assumptions. It becomes essential to continuously challenge inherited decisions, test hypotheses, and treat the system as an object of investigation rather than a collection of predefined components.

The second principle is observability as a foundation, not a supporting function. Observability is evolving from a troubleshooting tool into a core part of engineering culture. Systems need mechanisms that surface long-term trends: shifts in latency, the emergence of hidden dependencies, and changes in resource distribution. In practice, this means building the ability to “converse” with the system — to ask meaningful questions and receive meaningful answers.

The third principle is evolutionary architecture. Successful systems will not be static; they will be adaptive. This requires moving away from rigid structures that are difficult to change and toward architectures that can evolve incrementally without sacrificing predictability or control.

Finally, simplicity will remain a strategic asset. As systems grow more complex, the teams that can preserve architectural minimalism will have a significant advantage. Simplicity reduces the cost of change, lowers organizational overhead, and makes future scaling more predictable.

These principles are particularly important for understanding how distributed systems evolve and behave across different layers. A step-by-step architectural analysis — an approach I was able to explore in depth in the Getrafty book — helps uncover the real mechanics of scalability: how hidden dependencies emerge, how performance characteristics shift under load, and which architectural decisions enable sustainable growth versus creating long-term risks.

For engineering teams, this depth of analysis is becoming not an optional advantage but a fundamental part of professional practice, essential for building predictable and reliable large-scale systems.