Service decomposition, circuit breakers, sagas, event-driven communication, and observability.
Welcome everyone. Today we're diving into microservices patterns and the architectural principles for building resilient distributed systems. We'll explore when microservices make sense, the key patterns that make them work, and critically, the pitfalls to avoid. Microservices aren't just a technical choice, they're an organizational one, and we'll unpack what that really means. Whether you're considering breaking up a monolith or already running services in production, you'll walk away with practical patterns you can apply immediately. Let's get started.
So what exactly are microservices? At their core, they structure your application as a collection of small, autonomous services, where each one owns a single business capability. The key insight here is that the goal isn't about making services physically small. It's about enabling independent change. Looking at the four cards on screen: First, single responsibility means each service handles one business capability, like orders or payments, and nothing else. Second, independent deployment lets you ship any service without coordinating with other teams. Third, decentralized data means each service manages its own database with no shared schemas. And fourth, failure isolation ensures that if the payment service crashes, your catalog stays up. Failures stay contained.
The decision to use microservices isn't really about technology. It's about organizational scaling. For most teams starting out, a monolith is absolutely the right choice. Let's look at the table comparing the two approaches. For team size, monoliths work great with one to ten developers, while microservices require multiple autonomous teams. Deployment is all-or-nothing in a monolith versus independent per service. Data consistency gives you ACID transactions in a monolith but requires eventual consistency across services. Debugging is a single stack trace versus distributed tracing across the network. And notice the operational cost difference: one server and deploy pipeline versus dozens of pipelines, service mesh, and monitoring infrastructure. The callout here reinforces Martin Fowler's MonolithFirst advice: build a well-structured monolith, identify natural boundaries from real usage, then extract services only where you have proven needs.
How do you decide where to draw service boundaries? Looking at these four cards, we have three proven approaches and one anti-pattern. First, decompose by business domain using Domain-Driven Design bounded contexts. Orders, payments, shipping, and inventory are separate domains with clear ownership. Second, split by subdomain: your core domain is your competitive advantage, supporting subdomains help the core, and generic subdomains are commodity functions you can buy or use as SaaS. Third, the strangler fig pattern lets you incrementally replace monolith pieces by routing traffic to new services one endpoint at a time. The old system gradually shrinks until it disappears. And here's the anti-pattern: splitting by layer, like separating into UI service, logic service, and data service. This creates tight coupling across layers, and changes still require coordinated deploys. Avoid this one.
The API gateway serves as the single entry point for all external clients. Looking at the diagram, you can see mobile apps, web apps, and partner APIs all flow through the gateway, which then handles authentication, rate limiting, and routing to the appropriate backend services. The table breaks down why each concern belongs at the gateway level. Authentication validates tokens so downstream services can trust the verified identity. Rate limiting with token buckets protects all services uniformly. Request aggregation fans out to multiple services and merges responses, reducing client round-trips. Protocol translation lets clients use simple REST while services communicate efficiently with gRPC internally. And load shedding rejects excess traffic with 429 or 503 status codes to prevent cascading overload across your entire system.
In dynamic environments where services scale up and down, hardcoded URLs simply don't work. Service discovery solves the fundamental question: how does Service A find Service B? The sequence diagram shows the flow. Payment service instances register themselves with the service registry, providing their IP and port. When the order service needs to call payment, it queries the registry and receives a list of all healthy instances. The order service then load balances across them. Notice the registry performs health checks every ten seconds to maintain an accurate view. The table compares three approaches: client-side discovery where the client queries the registry directly, like Netflix Eureka. Server-side discovery where a load balancer queries on the client's behalf, like Kubernetes Services. And DNS-based discovery where the registry returns DNS records with TTL-based refresh, like Consul DNS.
When a downstream service is failing, continuing to call it only makes things worse. The circuit breaker pattern stops this cascade by failing fast. Looking at this TypeScript implementation, the circuit breaker maintains three states: closed, open, and half-open. It tracks failure counts and timestamps. In the closed state, requests flow normally. After five failures, it trips to open and immediately rejects all requests with an error, avoiding unnecessary network calls. After the reset timeout, it enters half-open state to test if the downstream service has recovered. Notice the call method wraps your async function. On success, it resets the failure counter and closes the circuit. On failure, it increments failures and records the timestamp. Once the threshold is reached, it opens the circuit. This protects your system from wasting resources on calls that will fail anyway.
Microservices can't use traditional ACID transactions across service boundaries. The saga pattern solves this by coordinating multi-service operations as a sequence of local transactions, each with compensating actions for rollback. The diagram shows a typical order flow: create order, reserve inventory, charge payment, then ship. If payment fails, we run compensating transactions in reverse: release the inventory reservation, then cancel the order. The table compares two saga implementations. Choreography uses event-driven communication where services react to events, giving you loose coupling but making the flow harder to trace. Orchestration uses a central coordinator that explicitly calls each step, providing better visibility but tighter coupling. The callout addresses a common question: why not use two-phase commit? Because 2PC requires holding locks until all participants confirm, which creates unacceptable latency and coupling in distributed systems with network partitions.
Event-driven communication decouples services in both time and space. Services publish events to a message broker without knowing or caring who consumes them. The sequence diagram illustrates this beautifully. Order service publishes an OrderCreated event to the broker. The broker then delivers it to multiple consumers: payment service, inventory service, and potentially others. Each consumer processes the event and may publish their own events. Payment publishes PaymentProcessed, inventory publishes InventoryReserved, and the cycle continues. Looking at the two cards below: at-least-once delivery means brokers guarantee delivery but may send duplicates, so every consumer must be idempotent using deduplication keys. The outbox pattern solves a critical problem: you write the event to a local outbox table in the same database transaction as your business change, then a separate process publishes from the outbox. This guarantees consistency without distributed transactions.
Each microservice must own its data store. No other service can access it directly, only through the service's API. This is the hardest rule to follow and the most important. The table shows four patterns and their trade-offs. Private database is the default for every service, with the trade-off being no cross-service queries. Shared database might be used during legacy migration, but creates coupling where schema changes break consumers. CQRS separates read and write models for read-heavy workloads, adding complexity. Event sourcing provides audit trails and replay capability but has a steep learning curve. The danger callout emphasizes the shared database anti-pattern: if two services read and write the same tables, they're not independent. A schema migration in one service breaks the other. You've created a distributed monolith, which gives you the worst of both worlds: all the complexity of microservices with none of the benefits.
In a monolith, a simple stack trace shows the full request path. But in microservices, a single user request might touch five to ten different services. Without distributed tracing, debugging becomes pure guesswork. Looking at the terminal output, we see a traced request with ID abc-123. The API gateway handles routing and auth in 12 milliseconds. Order service creates the order in 33 milliseconds. Inventory service takes 75 milliseconds to reserve stock, shown in yellow because it's the slowest span. Payment processes in 60 milliseconds, and notification sends the email in 15. Total end-to-end latency is 195 milliseconds. The four cards break down the observability pillars: traces follow a single request to find the slow service. RED metrics track request rate, error rate, and duration percentiles per service. Structured JSON logs with correlation IDs aggregate in your logging platform. And alerts should fire on symptoms like latency spikes, not root causes.
Deploying microservices safely requires strategies that limit blast radius. Every deployment is a potential incident, so we need to reduce the risk. The table compares four strategies by how they work, risk level, and rollback speed. Rolling updates replace instances one at a time with low risk and rollback in seconds. Blue-green deployments run two identical environments and switch traffic instantly between them, giving very low risk and instant rollback. Canary deployments route just 5 percent of traffic to the new version, monitor metrics, then gradually expand. This also offers very low risk with instant rollback. Feature flags deploy code in the off state, then toggle on per user or group, providing the lowest risk and instant rollback without redeployment. The callout recommends combining canary and feature flags: deploy with canary routing to 5 percent of traffic, and within that canary, use feature flags to enable new behavior. If metrics degrade, kill the flag instantly without any redeploy needed.
Let's talk about the common pitfalls that teams encounter. Looking at these four cards: First, premature decomposition. Splitting into services before you understand domain boundaries leads to wrong cuts. Start with a monolith and extract services only when boundaries become clear from real usage. Second, the distributed monolith. If your services must deploy together, share databases, or form long synchronous chains, you've built a monolith with added network latency. The test is simple: can you deploy one service alone? Third, ignoring idempotency. Network retries and message broker redelivery guarantee that every operation will run more than once. You must design every endpoint and consumer to be safely repeatable. And fourth, ignoring data consistency. You can't assume ACID transactions across services. Embrace eventual consistency, implement sagas properly, and design your system for convergence even when individual operations fail or arrive out of order.
Let's wrap up with the key takeaways. First, start with a modular monolith. Extract services only when you have a proven scaling need or a clear team boundary. Don't decompose prematurely. Second, each service owns its data. No shared databases. Communicate through well-defined APIs and events. Third, design for failure from day one. Implement circuit breakers, retries with exponential backoff, idempotent consumers, and dead letter queues for poison messages. And fourth, invest in observability first, not as an afterthought. Distributed tracing, structured logging, and RED metrics are prerequisites for operating microservices in production. Without them, you're flying blind. Thank you for your attention. These patterns will serve you well as you build and evolve distributed systems.
Hands-on implementation guides with detailed code examples, step-by-step instructions, and expanded explanations for each topic.