Rasmus Olsson

Azure SQL resource governance microservice architecture

Tags: devops,
July 17, 2025

In a microservice environment you can end up with a lot of databases, sometimes because you want hard ownership boundaries, sometimes because it was the easiest way to avoid coupling. Either way, the question that tends to show up is not "should we share compute", it's:

What controls do we actually have to keep cost and performance predictable when we have many databases?

A useful starting point is to think in terms of how compute is allocated. In practice, most setups lean in one of these two directions:

  • Dedicated compute per database or per service: great isolation and a simple mental model, but it can get expensive when many databases are idle most of the time. A "just in case" dedicated allocation that sits quiet for large parts of the day is still something you pay for continuously. A useful variant here is serverless, which can be great for a single database with truly intermittent usage. It can scale within a min and max range and optionally pause when idle. Serverless tends to cost more per vCore-hour than provisioned, so it usually only wins when average utilization is low enough that scaling down (and possibly auto-pause) compensates for the higher unit price. You still get predictable bounds by sum of min and max per database, but spend lives within a range, and if you enable auto-pause you also accept some resume and warm-up latency.
  • Shared compute across many databases: one underlying compute budget shared by multiple databases, which tends to fit the microservice "long tail" where a few databases are hot and many are small, spiky, or mostly idle. This is a popular choice cost-wise, because you pay for compute once and let databases share it.

This is also where shared compute starts to feel risky in practice. Not because sharing is wrong, but because it only takes one database to change the behavior of the whole environment.

A common example is a new service that looks harmless on paper, but then ships with a heavy query, a migration that runs during peak, or a background job that suddenly does a full-table scan. On a shared compute setup, that workload does not just hurt itself, it competes for the same CPU, memory, workers and I/O as everything else. The symptom you see is rarely "service X is slow", it's more often "a bunch of unrelated services got slower at the same time".

The uncomfortable part is that the cost-efficient model ("share compute") also makes the failure mode broader ("share pain") even though limiting blast radius is often seen as one of the benefits of a microservice architecture.

On Azure SQL (SQL Server), we have a surprisingly useful toolbox for handling shared compute without completely giving up guardrails:

  • Resource governance Azure SQL Server Database enforces limits via a resource governance layer that is based on SQL Server Resource Governor, adapted for the cloud.
  • Azure SQL Server Elastic Pools, which let many databases share a pool of compute while still giving you controls like per-database min/max DTU.
  • The option to move specific databases out of the pool when they stop behaving like "small and spiky" workloads.

This is worth calling out because it is not equally available across engines or cloud providers. A lot of the "elastic pool" story is tightly connected to SQL Server. Azure SQL Database's ability to enforce limits is built on a governance layer that is based on SQL Server Resource Governor concepts, adapted for the cloud. That is why Azure can offer pooled compute plus per-database caps as a first-class product feature.

Basic examples of limiting blast radius with min/max (DTU / vCore)

The most practical guardrail in an elastic pool is the per-database min/max setting. It caps how much compute a database can use when active, and it can reserve baseline capacity if you set a minimum above zero.

Important detail: the configured min and max apply to all databases in the pool. If you need different caps or guarantees for different services, the usual approach is to split workloads into separate pools (or move an outlier to dedicated compute).

Below are a few common patterns.

Example 1: Isolate a “risky” workload by putting it in its own pool

If you have a service that runs imports, backfills, or experimental queries, consider placing it in a separate elastic pool with a smaller budget and tighter per-database limits.

Why this helps: its worst day cannot consume the same shared budget as the rest of your estate.

Shape:

  • Long-tail pool: tuned for lots of small, spiky databases
  • Risky/batch pool: smaller pool budget, lower max-per-db

Example 2: Protect critical workloads with a dedicated budget boundary

If one database is truly on the critical path, the most reliable protection is to give it its own pool (or dedicated compute). This makes both performance and spend easier to reason about.

Why this helps: you avoid critical latency being dependent on what else happens to be active in a shared pool.

Shape:

  • Critical pool: sized for the SLO of the critical workload
  • Long-tail pool: everything else

Example 3: Long-tail defaults plus explicit outlier handling

A common operational setup is to keep one predictable pool for the long tail, then handle exceptions explicitly through placement.

Shape:

  • Pool A (Long tail): most microservice databases, low average usage
  • Pool B (Hot): known high-traffic databases
  • Pool C (Batch/maintenance): jobs, migrations, backfills (optional)

Why this helps: it keeps the default simple, makes outliers obvious, and makes promotions a normal operational decision.

Example 4: A database outgrows its current pool

A service can start small and fit nicely in the long-tail pool, then evolve over time.

Signals:

  • It frequently hits the per-database max in its pool
  • Latency starts correlating with other activity in the pool
  • New requirements appear (for example compliance, isolation, stricter SLOs)

Options:

  • Increase the pool size if the pool is simply lacking headroom
  • Move the database to a different pool that matches the workload shape
  • Move it to dedicated compute if you want tighter isolation and independent scaling

The main idea is that pools are great for the long tail, but it is normal to adjust placement as requirements evolve.

Further reading

If you want to go deeper into the details, here are the official docs I tend to reference:

Happy coding!

please share