Getting Started with Kubernetes: Deploying Microservice Architectures at Scale
In the rapidly evolving landscape of software architecture, microservices have emerged as the de facto standard for building scalable, maintainable applications. However, managing hundreds or thousands of containerized services across distributed infrastructure presents significant operational challenges. Enter Kubernetes—the container orchestration platform that has revolutionized how we deploy, scale, and manage microservice architectures.
This comprehensive guide will take you from Kubernetes fundamentals to deploying a production-ready microservice architecture, covering essential concepts, practical implementations, and industry best practices that will enable you to harness the full power of container orchestration.
Why This Guide Matters: According to the CNCF Annual Survey 2021, 96% of organizations are either using or evaluating Kubernetes. Understanding how to properly architect and deploy microservices on Kubernetes is no longer optional—it’s essential for modern software development.
Table of Contents
- Why Microservices Need Orchestration
- Kubernetes Fundamentals
- Setting Up Your Development Environment
- Building a Sample Microservice Architecture
- Core Kubernetes Resources for Microservices
- Service Discovery and Communication
- Configuration Management and Secrets
- Persistent Storage in Microservices
- Monitoring and Observability
- Production Deployment Strategies
- Security Best Practices
- Troubleshooting Common Issues
- Advanced Patterns and Next Steps
Why Microservices Need Orchestration
The Microservice Challenge
Before diving into Kubernetes, it’s crucial to understand why we need orchestration in the first place. Microservices architecture breaks down monolithic applications into smaller, independent services that communicate over well-defined APIs. While this approach offers numerous benefits—including improved scalability, technology diversity, and fault isolation—it introduces operational complexity that becomes unmanageable without proper tooling.
The Problem of Scale: Imagine manually managing even a modest e-commerce platform with just 10 microservices, each running 3 instances across 5 servers. That’s already 30 containers to track, update, and maintain. Now scale this to Netflix’s 700+ microservices or Amazon’s thousands of services, and the manual approach becomes impossible.
Consider a typical e-commerce platform decomposed into microservices:
- User Service: Handles authentication and user profiles
- Product Catalog Service: Manages product information and inventory
- Order Service: Processes orders and manages order state
- Payment Service: Handles payment processing and financial transactions
- Notification Service: Sends emails, SMS, and push notifications
- Analytics Service: Collects and processes user behavior data
Why Each Service Needs Multiple Instances: In production, you never run just one instance of a service. Here’s why:
- High Availability: If one instance crashes, others continue serving requests
- Load Distribution: Multiple instances can handle more concurrent users
- Zero-Downtime Deployments: You can update instances one at a time
- Geographic Distribution: Instances in different regions reduce latency
Each service might run multiple instances for redundancy and load distribution. Without orchestration, managing this ecosystem manually involves:
-
Deployment Complexity: Coordinating deployments across multiple services and environments becomes a nightmare when done manually. A single update might require touching dozens of servers and hundreds of configuration files.
-
Service Discovery: Services need to find each other dynamically. When instances start, stop, or move between servers, other services must be able to locate them automatically. Manual service discovery through static IP addresses and configuration files becomes brittle and error-prone at scale.
-
Load Balancing: Traffic must be distributed across healthy service instances. Manual load balancer configuration requires constant updates as instances come and go, leading to single points of failure and uneven load distribution.
-
Health Monitoring: Detecting failed instances and replacing them automatically is critical for maintaining uptime. Manual monitoring means someone needs to be watching dashboards 24/7 and manually restarting failed services.
-
Resource Management: Efficiently utilizing CPU, memory, and storage across the cluster requires intelligent placement decisions. Manual resource allocation leads to waste and performance bottlenecks.
-
Configuration Management: Safely distributing configuration and secrets to services without hardcoding values or exposing sensitive information becomes complex across multiple environments.
-
Scaling: Automatically adjusting the number of instances based on demand requires real-time monitoring and rapid response. Manual scaling means either over-provisioning (wasting money) or under-provisioning (poor user experience).
Real-World Context: As Martin Fowler explains in his seminal Microservices article, “The microservice architectural style is an approach to developing a single application as a suite of small services.” However, as he also notes, this approach introduces significant operational overhead that requires sophisticated tooling to manage effectively.
How Kubernetes Solves These Challenges
Kubernetes doesn’t just solve these problems—it fundamentally changes how we think about application deployment and management. Instead of imperative scripts (“run this command, then that command”), Kubernetes uses a declarative approach where you describe the desired state, and the system continuously works to maintain that state.
Why Declarative is Better: Think of it like a thermostat. Instead of manually turning the heater on and off (imperative), you set the desired temperature (declarative), and the thermostat automatically maintains it. Similarly, you tell Kubernetes “I want 3 instances of my user service running,” and it ensures that’s always true, even if instances crash or nodes fail.
Declarative Configuration: Instead of imperative scripts, you describe the desired state of your system, and Kubernetes continuously works to maintain that state. This approach, as detailed in the Kubernetes documentation on declarative management, provides better reliability and consistency.
Service Discovery and Load Balancing: Built-in mechanisms for services to find each other and distribute traffic automatically. Kubernetes creates DNS entries for every service, making service discovery as simple as making an HTTP request to a service name.
Self-Healing: Automatic replacement of failed containers, rescheduling of workloads on healthy nodes, and health checking. The Kubernetes control plane continuously monitors the cluster state and takes corrective actions.
Horizontal Scaling: Automatic scaling of applications based on CPU usage, memory consumption, or custom metrics like request rate or queue length.
Rolling Updates and Rollbacks: Zero-downtime deployments with automatic rollback capabilities when issues are detected. This implements the deployment patterns described in Jez Humble’s “Continuous Delivery”.
Resource Management: Efficient bin-packing of containers onto cluster nodes with resource guarantees and limits, ensuring optimal utilization while preventing resource contention.
Learning Path: For a deeper understanding of these concepts, I recommend starting with the official Kubernetes documentation and complementing it with Kelsey Hightower’s Kubernetes The Hard Way tutorial for hands-on experience with the underlying components.
Kubernetes Fundamentals
Architecture Overview
Understanding Kubernetes architecture is crucial because it explains why certain design decisions were made and how the system achieves its reliability and scalability goals.
Why the Master-Worker Pattern?: Kubernetes follows a master-worker architecture where the control plane manages the cluster state, and worker nodes run the actual application workloads. This separation of concerns is a fundamental design principle that provides several benefits:
- Separation of Concerns: Control logic is separated from workload execution
- Scalability: You can scale control plane and worker nodes independently
- Fault Tolerance: Control plane can be replicated for high availability
- Security: Control plane can be isolated from workloads
Control Plane Components
API Server: The central management entity that exposes the Kubernetes API. Why is this important? Everything in Kubernetes goes through the API server—kubectl commands, internal component communication, and even your applications. This single point of entry provides authentication, authorization, and validation for all cluster operations.
etcd: A distributed key-value store that maintains the cluster’s persistent state. Why etcd specifically? Kubernetes needs a data store that can handle distributed consensus (ensuring all control plane nodes agree on the cluster state) and provides strong consistency guarantees. etcd’s Raft consensus algorithm makes it perfect for this role.
Scheduler: Determines which nodes should run newly created pods based on resource requirements, constraints, and policies. Why not just random placement? Intelligent scheduling is crucial for resource efficiency, performance, and meeting application requirements like anti-affinity rules or GPU requirements.
Controller Manager: Runs various controllers that handle routine tasks like ensuring the desired number of replicas are running. Why the controller pattern? Controllers implement the “reconciliation loop”—continuously comparing desired state with actual state and taking corrective actions. This is the foundation of Kubernetes’ self-healing capabilities.
Deep Dive: For an excellent explanation of these components and their interactions, see the Kubernetes Architecture Explained article by Platform9.
Worker Node Components
kubelet: The primary node agent that communicates with the API server and manages pods and containers on the node. Why on every node? The kubelet is responsible for the actual container lifecycle management, health checking, and resource monitoring. It’s the “hands” of Kubernetes on each node.
kube-proxy: Maintains network rules for service load balancing and enables communication between pods across the cluster. Why not just use DNS? While DNS can resolve service names to IPs, kube-proxy implements the actual load balancing and provides features like session affinity and different load balancing algorithms.
Container Runtime: The software responsible for running containers (Docker, containerd, or CRI-O). Why pluggable? Different organizations have different requirements for container runtimes (security, performance, compliance), so Kubernetes uses the Container Runtime Interface (CRI) to support multiple options.
Core Concepts
Understanding these fundamental concepts is crucial for effective Kubernetes usage. Each concept solves specific problems that arise in distributed systems:
Pod: The smallest deployable unit in Kubernetes, typically containing a single container along with shared storage and network. Why not just containers? Pods provide a shared execution environment (network namespace, storage volumes) that enables patterns like sidecar containers for logging, monitoring, or service mesh proxies.
Service: An abstraction that defines a logical set of pods and enables network access to them. Why not direct pod IPs? Pods are ephemeral—they can be created, destroyed, and rescheduled at any time. Services provide a stable network endpoint that automatically routes traffic to healthy pods.
Deployment: Manages the deployment and scaling of pods, ensuring the desired number of replicas are running. Why not create pods directly? Deployments provide declarative updates, rollback capabilities, and replica management. They implement the deployment patterns that enable zero-downtime updates.
ConfigMap and Secret: Objects for managing configuration data and sensitive information separately from application code. Why separate configuration? Following the Twelve-Factor App methodology, configuration should be environment-specific and externalized from code. This enables the same container image to run in different environments.
Namespace: Virtual clusters within a physical cluster, providing scope for resource names and enabling multi-tenancy. Why namespaces? They provide isolation, resource quotas, and RBAC boundaries, enabling multiple teams or environments to share a cluster safely.
Ingress: Manages external access to services, typically HTTP/HTTPS, with features like load balancing, SSL termination, and name-based virtual hosting. Why not just LoadBalancer services? Ingress provides Layer 7 (HTTP) features and can consolidate multiple services behind a single load balancer, reducing cloud provider costs.
Essential Reading: The Kubernetes Concepts documentation provides authoritative explanations of these concepts. Additionally, Brendan Burns’ book “Kubernetes: Up and Running” offers excellent practical insights from one of Kubernetes’ creators.
Setting Up Your Development Environment
Before jumping into production deployments, it’s essential to have a local development environment where you can experiment safely. Why start locally? Local development provides fast feedback loops, no cloud costs, and the ability to break things without consequences.
Local Development Options
Why Multiple Options? Different developers have different needs—some want full multi-node clusters for testing, others want lightweight single-node setups for development. Each option has trade-offs:
Option 1: Minikube
When to Use: Perfect for learning Kubernetes concepts and testing applications that don’t require multi-node features.
Why Minikube? It provides a full Kubernetes cluster in a single VM, including all control plane components. This gives you the most authentic Kubernetes experience while remaining lightweight.
Minikube runs a single-node Kubernetes cluster locally, perfect for development and testing:
# Install minikube (macOS)
brew install minikube
# Start minikube with specific resource allocation
# Why these resources? 8GB RAM and 4 CPUs provide enough resources
# for running multiple microservices without overwhelming your laptop
minikube start --memory=8192 --cpus=4 --driver=docker
# Enable useful addons
# Why these addons? They provide essential cluster services you'll need
minikube addons enable ingress # HTTP/HTTPS load balancing
minikube addons enable dashboard # Web UI for cluster management
minikube addons enable metrics-server # Resource usage metrics for HPA
Option 2: Kind (Kubernetes in Docker)
When to Use: Best for CI/CD pipelines and when you need to test multi-node scenarios.
Why Kind? It runs Kubernetes nodes as Docker containers, making it very fast to create and destroy clusters. It’s also what many Kubernetes developers use for testing.
Kind runs Kubernetes clusters using Docker containers as nodes:
# Install kind
go install sigs.k8s.io/kind@latest
# Why multi-node? This configuration simulates a real cluster
# with separate control plane and worker nodes, enabling you to test
# scenarios like node affinity, taints, and tolerations
cat << EOF > kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
# Why port mapping? This exposes the ingress controller
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
- role: worker
- role: worker
EOF
kind create cluster --config kind-config.yaml --name microservices-demo
Option 3: Docker Desktop
When to Use: If you’re already using Docker Desktop and want the simplest setup.
Why Docker Desktop? It provides a one-click Kubernetes cluster that integrates seamlessly with Docker Desktop’s interface. However, it’s limited to single-node setups.
Docker Desktop includes a built-in Kubernetes cluster that’s easy to enable through the Docker Desktop settings.
Recommendation: For this tutorial, I recommend starting with Minikube as it provides the best balance of authenticity and simplicity. The official Minikube documentation provides comprehensive setup instructions for all platforms.
Essential Tools
Why These Tools? Each tool serves a specific purpose in the Kubernetes development workflow:
Install these tools for effective Kubernetes development:
# kubectl - Kubernetes CLI
# Why kubectl? It's the primary interface for interacting with Kubernetes clusters
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/darwin/amd64/kubectl"
chmod +x kubectl && sudo mv kubectl /usr/local/bin/
# Helm - Package manager for Kubernetes
# Why Helm? It simplifies deploying complex applications with templates and dependency management
brew install helm
# k9s - Terminal-based cluster management tool
# Why k9s? It provides a user-friendly TUI for cluster management and debugging
brew install k9s
# kubectx/kubens - Context and namespace switching
# Why these tools? They make it easy to switch between clusters and namespaces
brew install kubectx
Additional Recommended Tools:
# stern - Multi-pod log tailing
# Why stern? It allows you to tail logs from multiple pods simultaneously
brew install stern
# kustomize - Configuration management
# Why kustomize? It's built into kubectl but the standalone version has more features
brew install kustomize
# kubeval - Kubernetes YAML validation
# Why kubeval? It catches YAML errors before you apply them to the cluster
brew tap instrumenta/instrumenta && brew install kubeval
Verifying Your Setup
Why Verify? It’s crucial to ensure your cluster is working correctly before proceeding with application deployment.
Test your Kubernetes installation:
# Check cluster info
kubectl cluster-info
# This should show the API server URL and other cluster services
# View nodes
kubectl get nodes
# You should see your minikube node in "Ready" state
# Check system pods
kubectl get pods -n kube-system
# All system pods should be in "Running" or "Completed" state
# Test deployment capability
kubectl create deployment nginx --image=nginx
kubectl get deployments
# Should show nginx deployment with 1/1 ready replicas
# Test service creation
kubectl expose deployment nginx --port=80 --type=NodePort
kubectl get services
# Clean up test resources
kubectl delete deployment nginx
kubectl delete service nginx
Troubleshooting Resources: If you encounter issues, the Kubernetes Troubleshooting Guide and Minikube Troubleshooting documentation provide comprehensive solutions.
Building a Sample Microservice Architecture
Now that we have our development environment ready, let’s build a realistic microservice architecture. Why start with a concrete example? Learning Kubernetes concepts in isolation can be abstract and difficult to relate to real-world scenarios. By building an actual application, we can understand how the pieces fit together.
Architecture Overview
Design Philosophy: Our sample application follows the principles outlined in Sam Newman’s “Building Microservices” and implements patterns from the Microservices.io pattern library.
Why This Architecture? We’re building an e-commerce platform because it naturally demonstrates key microservice challenges:
- Data consistency across services (orders, inventory, payments)
- Inter-service communication patterns
- Different scaling requirements (catalog browsing vs. order processing)
- Security boundaries (user data vs. payment processing)
Our sample application consists of:
- Frontend Service: React application serving the user interface
- Why separate? Frontend deployment cycles often differ from backend services
- Scaling pattern: CDN + lightweight Node.js for SSR
- API Gateway: Routes requests to appropriate backend services
- Why needed? Provides a single entry point, authentication, rate limiting, and request routing
- Alternative: Could use Ingress Controller, but API Gateway provides more application-level features
- User Service: Manages user authentication and profiles
- Why separate? User management has different security, scaling, and compliance requirements
- Data store: PostgreSQL for strong consistency of user data
- Product Service: Handles product catalog and inventory
- Why separate? Product browsing is typically read-heavy and needs different caching strategies
- Scaling pattern: Read replicas, aggressive caching
- Order Service: Processes orders and manages order lifecycle
- Why separate? Order processing involves complex business logic and state management
- Integration pattern: Event-driven communication with other services
- Database Services: PostgreSQL for persistent data storage
- Why PostgreSQL? ACID compliance for financial data, mature Kubernetes operators available
- Alternative considerations: Could use separate databases per service for true isolation
- Redis: For caching and session storage
- Why Redis? High-performance caching, session storage, and pub/sub capabilities
- Usage patterns: Cache-aside pattern for product data, session store for user state
Architecture Inspiration: This design follows patterns described in Chris Richardson’s “Microservices Patterns” and incorporates lessons from companies like Netflix and Amazon as documented in their architecture blogs.
Sample Application Code Structure
Why This Structure? This organization follows the Twelve-Factor App methodology and makes it easy to manage multiple services in a single repository (monorepo approach) or split them into separate repositories later.
First, let’s create the directory structure for our microservices:
mkdir k8s-microservices-demo
cd k8s-microservices-demo
# Create service directories
# Why this structure? Each service is self-contained with its own Dockerfile and source code
mkdir -p {frontend,api-gateway,user-service,product-service,order-service}/src
# Create Kubernetes manifests directory
# Why separate base and overlays? This follows Kustomize patterns for environment-specific configuration
mkdir -p k8s/{base,overlays/{development,staging,production}}
# Create Docker images directory for multi-stage builds
mkdir -p docker-images
# Create scripts directory for automation
mkdir -p scripts/{database,monitoring,deployment}
User Service Implementation
Why Node.js? It’s excellent for I/O-heavy microservices, has great Kubernetes client libraries, and provides fast development cycles. However, the patterns shown here apply to any language.
Security Considerations: This implementation follows OWASP guidelines for authentication and includes password hashing, JWT tokens, and input validation.
Here’s a simplified Node.js user service:
// user-service/src/app.js
const express = require('express');
const bcrypt = require('bcrypt');
const jwt = require('jsonwebtoken');
const { Pool } = require('pg');
const helmet = require('helmet'); // Security middleware
const rateLimit = require('express-rate-limit'); // Rate limiting
const app = express();
// Security middleware
app.use(helmet());
app.use(express.json({ limit: '10mb' })); // Prevent payload bombs
// Rate limiting - Why? Prevents brute force attacks and API abuse
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100 // limit each IP to 100 requests per windowMs
});
app.use('/api/', limiter);
// Database connection with connection pooling
// Why connection pooling? Efficiently manages database connections and prevents connection exhaustion
const pool = new Pool({
host: process.env.DB_HOST || 'postgres-service',
port: process.env.DB_PORT || 5432,
database: process.env.DB_NAME || 'userdb',
user: process.env.DB_USER || 'postgres',
password: process.env.DB_PASSWORD || 'password',
max: 20, // Maximum number of clients in the pool
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
// Graceful shutdown handling
// Why? Ensures database connections are properly closed when the pod terminates
process.on('SIGTERM', async () => {
console.log('SIGTERM received, closing database pool...');
await pool.end();
process.exit(0);
});
// Health check endpoint - Why detailed health checks?
// Kubernetes uses these for liveness and readiness probes
app.get('/health', async (req, res) => {
try {
// Check database connectivity
await pool.query('SELECT 1');
res.json({
status: 'healthy',
service: 'user-service',
timestamp: new Date().toISOString(),
version: process.env.APP_VERSION || '1.0.0'
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
error: 'Database connection failed'
});
}
});
// Readiness probe - Why separate from health?
// Readiness indicates when the service can accept traffic
app.get('/ready', async (req, res) => {
try {
// Check if database schema is ready
await pool.query('SELECT count(*) FROM users LIMIT 1');
res.json({ status: 'ready' });
} catch (error) {
res.status(503).json({ status: 'not ready', error: error.message });
}
});
// User registration with comprehensive validation
app.post('/api/users/register', async (req, res) => {
try {
const { email, password, name } = req.body;
// Input validation - Why? Prevent SQL injection and data corruption
if (!email || !password || !name) {
return res.status(400).json({ error: 'Missing required fields' });
}
if (password.length < 8) {
return res.status(400).json({ error: 'Password must be at least 8 characters' });
}
// Why salt rounds = 12? Balances security with performance
const hashedPassword = await bcrypt.hash(password, 12);
const result = await pool.query(
'INSERT INTO users (email, password_hash, name, created_at) VALUES ($1, $2, $3, NOW()) RETURNING id, email, name, created_at',
[email, hashedPassword, name]
);
// Why JWT? Stateless authentication that works well in distributed systems
const token = jwt.sign(
{
userId: result.rows[0].id,
email: result.rows[0].email
},
process.env.JWT_SECRET || 'default-secret',
{ expiresIn: '24h' } // Why 24h? Balance between security and user experience
);
res.status(201).json({
user: result.rows[0],
token,
expiresIn: '24h'
});
} catch (error) {
console.error('Registration error:', error);
// Why check for unique constraint? Provide user-friendly error messages
if (error.code === '23505') { // PostgreSQL unique violation
return res.status(409).json({ error: 'Email already exists' });
}
res.status(500).json({ error: 'Registration failed' });
}
});
// User authentication with security best practices
app.post('/api/users/login', async (req, res) => {
try {
const { email, password } = req.body;
if (!email || !password) {
return res.status(400).json({ error: 'Email and password required' });
}
const result = await pool.query(
'SELECT id, email, name, password_hash FROM users WHERE email = $1',
[email]
);
if (result.rows.length === 0) {
// Why same error message? Prevents email enumeration attacks
return res.status(401).json({ error: 'Invalid credentials' });
}
const user = result.rows[0];
const validPassword = await bcrypt.compare(password, user.password_hash);
if (!validPassword) {
return res.status(401).json({ error: 'Invalid credentials' });
}
const token = jwt.sign(
{
userId: user.id,
email: user.email
},
process.env.JWT_SECRET || 'default-secret',
{ expiresIn: '24h' }
);
res.json({
user: { id: user.id, email: user.email, name: user.name },
token,
expiresIn: '24h'
});
} catch (error) {
console.error('Login error:', error);
res.status(500).json({ error: 'Login failed' });
}
});
// JWT middleware for protected routes
const authenticateToken = (req, res, next) => {
const authHeader = req.headers['authorization'];
const token = authHeader && authHeader.split(' ')[1]; // Bearer TOKEN
if (!token) {
return res.status(401).json({ error: 'Access token required' });
}
jwt.verify(token, process.env.JWT_SECRET || 'default-secret', (err, user) => {
if (err) {
return res.status(403).json({ error: 'Invalid or expired token' });
}
req.user = user;
next();
});
};
// Protected route example
app.get('/api/users/profile', authenticateToken, async (req, res) => {
try {
const result = await pool.query(
'SELECT id, email, name, created_at FROM users WHERE id = $1',
[req.user.userId]
);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'User not found' });
}
res.json(result.rows[0]);
} catch (error) {
console.error('Profile fetch error:', error);
res.status(500).json({ error: 'Failed to fetch profile' });
}
});
const port = process.env.PORT || 3001;
app.listen(port, () => {
console.log(`User service listening on port ${port}`);
console.log(`Environment: ${process.env.NODE_ENV || 'development'}`);
});
Product Service Implementation
Why Separate Product Service? Product browsing has different characteristics than user management:
- Read-heavy workload (many more reads than writes)
- Different caching requirements (products can be cached aggressively)
- Different scaling patterns (might need more read replicas)
// product-service/src/app.js
const express = require('express');
const { Pool } = require('pg');
const Redis = require('redis');
const helmet = require('helmet');
const app = express();
app.use(helmet());
app.use(express.json());
// Database connection
const pool = new Pool({
host: process.env.DB_HOST || 'postgres-service',
port: process.env.DB_PORT || 5432,
database: process.env.DB_NAME || 'productdb',
user: process.env.DB_USER || 'postgres',
password: process.env.DB_PASSWORD || 'password',
max: 20,
});
// Redis client for caching
// Why Redis? Fast in-memory cache that reduces database load
const redis = Redis.createClient({
host: process.env.REDIS_HOST || 'redis-service',
port: process.env.REDIS_PORT || 6379,
retry_strategy: (options) => {
if (options.error && options.error.code === 'ECONNREFUSED') {
return new Error('The server refused the connection');
}
if (options.total_retry_time > 1000 * 60 * 60) {
return new Error('Retry time exhausted');
}
if (options.attempt > 10) {
return undefined;
}
return Math.min(options.attempt * 100, 3000);
}
});
redis.on('error', (err) => {
console.warn('Redis connection error:', err);
});
// Health check with dependency checks
app.get('/health', async (req, res) => {
try {
await pool.query('SELECT 1');
const redisHealthy = redis.connected;
res.json({
status: 'healthy',
service: 'product-service',
dependencies: {
database: 'healthy',
redis: redisHealthy ? 'healthy' : 'degraded'
}
});
} catch (error) {
res.status(503).json({
status: 'unhealthy',
error: 'Database connection failed'
});
}
});
// Cache-aside pattern implementation
// Why cache-aside? Gives us control over what to cache and when to invalidate
const getCachedProducts = async (cacheKey) => {
try {
const cached = await redis.get(cacheKey);
return cached ? JSON.parse(cached) : null;
} catch (error) {
console.warn('Cache read error:', error);
return null;
}
};
const setCachedProducts = async (cacheKey, data, ttl = 300) => {
try {
await redis.setex(cacheKey, ttl, JSON.stringify(data));
} catch (error) {
console.warn('Cache write error:', error);
}
};
// Get all products with caching
app.get('/api/products', async (req, res) => {
try {
const { category, page = 1, limit = 20 } = req.query;
const cacheKey = `products:${category || 'all'}:${page}:${limit}`;
// Try cache first
let products = await getCachedProducts(cacheKey);
if (!products) {
// Cache miss - fetch from database
const offset = (page - 1) * limit;
let query = 'SELECT id, name, description, price, stock_quantity, category FROM products WHERE active = true';
let params = [];
if (category) {
query += ' AND category = $1';
params.push(category);
}
query += ' ORDER BY created_at DESC LIMIT $' + (params.length + 1) + ' OFFSET $' + (params.length + 2);
params.push(limit, offset);
const result = await pool.query(query, params);
products = result.rows;
// Cache the results
await setCachedProducts(cacheKey, products, 300); // 5-minute cache
}
res.json(products);
} catch (error) {
console.error('Error fetching products:', error);
res.status(500).json({ error: 'Failed to fetch products' });
}
});
// Get product by ID with caching
app.get('/api/products/:id', async (req, res) => {
try {
const { id } = req.params;
const cacheKey = `product:${id}`;
let product = await getCachedProducts(cacheKey);
if (!product) {
const result = await pool.query(
'SELECT id, name, description, price, stock_quantity, category, created_at FROM products WHERE id = $1 AND active = true',
[id]
);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Product not found' });
}
product = result.rows[0];
await setCachedProducts(cacheKey, product, 600); // 10-minute cache for individual products
}
res.json(product);
} catch (error) {
console.error('Error fetching product:', error);
res.status(500).json({ error: 'Failed to fetch product' });
}
});
// Create new product (admin only in real app)
app.post('/api/products', async (req, res) => {
try {
const { name, description, price, stock_quantity, category } = req.body;
// Validation
if (!name || !price || stock_quantity === undefined) {
return res.status(400).json({ error: 'Missing required fields' });
}
const result = await pool.query(
'INSERT INTO products (name, description, price, stock_quantity, category, created_at) VALUES ($1, $2, $3, $4, $5, NOW()) RETURNING *',
[name, description, price, stock_quantity, category]
);
// Invalidate relevant caches
// Why invalidate? Ensure cache consistency when data changes
const cachePatterns = ['products:*', `product:${result.rows[0].id}`];
for (const pattern of cachePatterns) {
try {
const keys = await redis.keys(pattern);
if (keys.length > 0) {
await redis.del(keys);
}
} catch (error) {
console.warn('Cache invalidation error:', error);
}
}
res.status(201).json(result.rows[0]);
} catch (error) {
console.error('Error creating product:', error);
res.status(500).json({ error: 'Failed to create product' });
}
});
const port = process.env.PORT || 3002;
app.listen(port, () => {
console.log(`Product service listening on port ${port}`);
});
Dockerizing the Services
Why Multi-stage Builds? They reduce image size by excluding development dependencies and build tools from the final image, improving security and deployment speed.
Security Best Practices: Notice how we run as a non-root user and use specific base image versions for reproducible builds.
Create Dockerfiles for each service:
# user-service/Dockerfile
# Why specific version? Ensures reproducible builds and security updates
FROM node:18-alpine AS builder
WORKDIR /app
# Copy package files first - Why? Enables Docker layer caching
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
# Production stage
FROM node:18-alpine AS production
# Why create user? Security best practice - never run as root
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
WORKDIR /app
# Copy dependencies from builder stage
COPY --from=builder /app/node_modules ./node_modules
COPY --chown=nodejs:nodejs src/ ./src/
COPY --chown=nodejs:nodejs package*.json ./
# Why these labels? Helps with container management and debugging
LABEL maintainer="your-team@company.com" \
version="1.0.0" \
description="User service for microservices demo"
USER nodejs
EXPOSE 3001
# Why node directly? Ensures proper signal handling for graceful shutdowns
CMD ["node", "src/app.js"]
# product-service/Dockerfile
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
FROM node:18-alpine AS production
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --chown=nodejs:nodejs src/ ./src/
COPY --chown=nodejs:nodejs package*.json ./
LABEL maintainer="your-team@company.com" \
version="1.0.0" \
description="Product service for microservices demo"
USER nodejs
EXPOSE 3002
CMD ["node", "src/app.js"]
Build Optimization: For more advanced Docker optimization techniques, see the Docker Best Practices Guide and Google’s Container Image Building Best Practices.
Core Kubernetes Resources for Microservices
Now let’s deploy our microservices to Kubernetes using core resources. Why start with core resources? Understanding the fundamental building blocks helps you make informed decisions about when to use higher-level abstractions like Helm charts or operators.
Namespace Organization
Why Namespaces? They provide logical isolation, enable resource quotas, and allow multiple teams to share a cluster safely. Think of them as virtual clusters within your physical cluster.
Naming Convention: Use environment-prefixed names (dev-, staging-, prod-) or team-based names (team-a-, team-b-) depending on your organization’s structure.
Create namespaces to organize resources:
# k8s/base/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: microservices
labels:
name: microservices
environment: development
# Why these labels? They enable advanced features like network policies and resource quotas
app.kubernetes.io/name: ecommerce-platform
app.kubernetes.io/part-of: microservices-demo
---
# Optional: Resource quota to prevent resource exhaustion
apiVersion: v1
kind: ResourceQuota
metadata:
name: microservices-quota
namespace: microservices
spec:
hard:
requests.cpu: "4" # Total CPU requests allowed
requests.memory: 8Gi # Total memory requests allowed
limits.cpu: "8" # Total CPU limits allowed
limits.memory: 16Gi # Total memory limits allowed
pods: "20" # Maximum number of pods
services: "10" # Maximum number of services
persistentvolumeclaims: "5" # Maximum PVCs
Resource Management: For more on resource management, see the Kubernetes Resource Management Guide and the LimitRange documentation.
Database Deployment
Why StatefulSet for Database? StatefulSets provide:
- Stable network identities (predictable pod names)
- Ordered deployment and scaling (important for clustered databases)
- Persistent storage that survives pod rescheduling
Security Note: In production, use external managed databases (RDS, Cloud SQL) or dedicated database operators like PostgreSQL Operator.
Deploy PostgreSQL as a StatefulSet for data persistence:
# k8s/base/postgres.yaml
# Why Secret? Keeps sensitive data separate from configuration
apiVersion: v1
kind: Secret
metadata:
name: postgres-secret
namespace: microservices
type: Opaque
data:
# Why base64? Kubernetes Secrets use base64 encoding (not encryption!)
# In production, use external secret management systems
postgres-password: cGFzc3dvcmQ= # "password"
postgres-user: cG9zdGdyZXM= # "postgres"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
namespace: microservices
data:
POSTGRES_DB: ecommerce
POSTGRES_USER: postgres
# Why init script? Sets up database schema on first startup
init.sql: |
-- Create databases for each service
CREATE DATABASE IF NOT EXISTS userdb;
CREATE DATABASE IF NOT EXISTS productdb;
CREATE DATABASE IF NOT EXISTS orderdb;
-- Create users table
\c userdb;
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create products table
\c productdb;
CREATE TABLE IF NOT EXISTS products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL,
stock_quantity INTEGER DEFAULT 0,
category VARCHAR(100),
active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Insert sample data
INSERT INTO products (name, description, price, stock_quantity, category) VALUES
('Laptop', 'High-performance laptop', 1299.99, 10, 'Electronics'),
('Coffee Mug', 'Ceramic coffee mug', 12.99, 50, 'Kitchen'),
('Book', 'Programming guide', 39.99, 25, 'Books')
ON CONFLICT DO NOTHING;
---
# Why PVC? Ensures data persists even if pods are rescheduled
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
namespace: microservices
spec:
accessModes:
- ReadWriteOnce # Why RWO? PostgreSQL doesn't support concurrent writes
resources:
requests:
storage: 10Gi
# storageClassName: fast-ssd # Uncomment for specific storage class
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: microservices
labels:
app: postgres
app.kubernetes.io/name: postgresql
app.kubernetes.io/component: database
spec:
serviceName: postgres-service
replicas: 1 # Why 1? PostgreSQL primary-replica setup requires special configuration
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
app.kubernetes.io/name: postgresql
app.kubernetes.io/component: database
spec:
containers:
- name: postgres
image: postgres:15-alpine # Why alpine? Smaller image size, faster deployments
env:
- name: POSTGRES_DB
valueFrom:
configMapKeyRef:
name: postgres-config
key: POSTGRES_DB
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: postgres-user
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: postgres-password
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
ports:
- containerPort: 5432
name: postgresql
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
- name: postgres-config-volume
mountPath: /docker-entrypoint-initdb.d
# Why these probes? Ensure PostgreSQL is ready before accepting connections
livenessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
volumes:
- name: postgres-storage
persistentVolumeClaim:
claimName: postgres-pvc
- name: postgres-config-volume
configMap:
name: postgres-config
---
# Why ClusterIP? Database should only be accessible within the cluster
apiVersion: v1
kind: Service
metadata:
name: postgres-service
namespace: microservices
labels:
app: postgres
app.kubernetes.io/name: postgresql
app.kubernetes.io/component: database
spec:
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432
name: postgresql
type: ClusterIP
User Service Deployment
Why Deployment? For stateless services, Deployments provide:
- Rolling updates with zero downtime
- Replica management ensures desired number of instances
- Rollback capabilities if deployments fail
Health Checks: Notice the liveness and readiness probes—these are crucial for reliable deployments.
# k8s/base/user-service.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: microservices
labels:
app: user-service
app.kubernetes.io/name: user-service
app.kubernetes.io/component: backend
app.kubernetes.io/part-of: ecommerce-platform
spec:
replicas: 2 # Why 2? Minimum for high availability without over-provisioning
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Why 1? Ensures at least one instance is always running
maxSurge: 1 # Why 1? Limits resource usage during deployments
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
version: v1 # Why version label? Enables advanced deployment strategies
spec:
# Why security context? Implements security best practices
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
containers:
- name: user-service
image: your-registry/user-service:latest # Replace with your registry
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3001
name: http
protocol: TCP
env:
- name: NODE_ENV
value: production
- name: PORT
value: "3001"
- name: DB_HOST
value: postgres-service # Why service name? Kubernetes DNS resolution
- name: DB_NAME
value: userdb
- name: DB_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: postgres-user
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: postgres-password
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: app-secrets
key: jwt-secret
# Why liveness probe? Kubernetes restarts unhealthy containers
livenessProbe:
httpGet:
path: /health
port: 3001
initialDelaySeconds: 30 # Why 30s? Allows time for application startup
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
# Why readiness probe? Controls when pod receives traffic
readinessProbe:
httpGet:
path: /ready
port: 3001
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
# Why resource limits? Prevents resource contention and enables better scheduling
resources:
requests:
memory: "128Mi" # Minimum guaranteed memory
cpu: "100m" # Minimum guaranteed CPU (0.1 core)
limits:
memory: "256Mi" # Maximum memory before OOMKill
cpu: "200m" # Maximum CPU usage
# Why security context? Additional container-level security
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true # Prevents runtime file system modifications
capabilities:
drop:
- ALL # Drop all capabilities for security
# Why volume mounts? Read-only root filesystem requires writable tmp
volumeMounts:
- name: tmp-volume
mountPath: /tmp
- name: var-run-volume
mountPath: /var/run
volumes:
- name: tmp-volume
emptyDir: {}
- name: var-run-volume
emptyDir: {}
# Why restart policy? Ensures failed containers are restarted
restartPolicy: Always
---
# Why separate service? Provides stable endpoint regardless of pod changes
apiVersion: v1
kind: Service
metadata:
name: user-service
namespace: microservices
labels:
app: user-service
app.kubernetes.io/name: user-service
app.kubernetes.io/component: backend
spec:
selector:
app: user-service
ports:
- port: 3001
targetPort: 3001
protocol: TCP
name: http
type: ClusterIP # Why ClusterIP? Internal service, accessed via API gateway
---
# Application secrets
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
namespace: microservices
type: Opaque
data:
# Why strong JWT secret? Security of authentication tokens
jwt-secret: bXktc2VjdXJlLWp3dC1rZXktd2l0aC1lbm91Z2gtZW50cm9weQ== # base64 encoded secure key
Product Service Deployment
Why Different Replica Count? Product browsing is typically more resource-intensive than user management, so we scale it differently.
# k8s/base/product-service.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: product-service
namespace: microservices
labels:
app: product-service
app.kubernetes.io/name: product-service
app.kubernetes.io/component: backend
spec:
replicas: 3 # Why 3? Higher traffic expected for product browsing
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 2 # Why 2? Allows faster scaling during high traffic
selector:
matchLabels:
app: product-service
template:
metadata:
labels:
app: product-service
version: v1
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
containers:
- name: product-service
image: your-registry/product-service:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3002
name: http
protocol: TCP
env:
- name: NODE_ENV
value: production
- name: PORT
value: "3002"
- name: DB_HOST
value: postgres-service
- name: DB_NAME
value: productdb
- name: REDIS_HOST
value: redis-service # Why Redis? Caching for better performance
- name: DB_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: postgres-user
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: postgres-password
livenessProbe:
httpGet:
path: /health
port: 3002
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 3002
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp-volume
mountPath: /tmp
- name: var-run-volume
mountPath: /var/run
volumes:
- name: tmp-volume
emptyDir: {}
- name: var-run-volume
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: product-service
namespace: microservices
labels:
app: product-service
app.kubernetes.io/name: product-service
app.kubernetes.io/component: backend
spec:
selector:
app: product-service
ports:
- port: 3002
targetPort: 3002
protocol: TCP
name: http
type: ClusterIP
Redis Deployment for Caching
Why Redis? It provides high-performance caching and session storage, crucial for microservice architectures where you want to minimize database load.
# k8s/base/redis.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: microservices
labels:
app: redis
app.kubernetes.io/name: redis
app.kubernetes.io/component: cache
spec:
replicas: 1 # Why 1? Redis clustering requires special configuration
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
command:
- redis-server
- --appendonly yes # Why? Enables persistence
- --maxmemory 256mb
- --maxmemory-policy allkeys-lru # Why LRU? Good default for caching
ports:
- containerPort: 6379
name: redis
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
volumeMounts:
- name: redis-data
mountPath: /data
volumes:
- name: redis-data
emptyDir: {} # Why emptyDir? For demo purposes; use PVC in production
---
apiVersion: v1
kind: Service
metadata:
name: redis-service
namespace: microservices
labels:
app: redis
spec:
selector:
app: redis
ports:
- port: 6379
targetPort: 6379
name: redis
type: ClusterIP
API Gateway with NGINX
Why API Gateway? It provides:
- Single entry point for all client requests
- Request routing to appropriate backend services
- Cross-cutting concerns like authentication, rate limiting, CORS
- Protocol translation (HTTP to gRPC, etc.)
# k8s/base/api-gateway.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-config
namespace: microservices
data:
nginx.conf: |
# Why custom nginx.conf? Need specific routing rules for microservices
events {
worker_connections 1024;
}
http {
# Why upstream blocks? Enable load balancing and health checks
upstream user-service {
server user-service:3001;
}
upstream product-service {
server product-service:3002;
}
# Why custom log format? Better observability for microservices
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'upstream_addr=$upstream_addr '
'upstream_response_time=$upstream_response_time';
access_log /var/log/nginx/access.log main;
server {
listen 80;
# Why CORS headers? Enable frontend applications to call APIs
add_header Access-Control-Allow-Origin *;
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, OPTIONS";
add_header Access-Control-Allow-Headers "Authorization, Content-Type";
# Handle preflight requests
if ($request_method = 'OPTIONS') {
return 204;
}
# Route user service requests
location /api/users/ {
proxy_pass http://user-service;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Why timeouts? Prevent hanging requests
proxy_connect_timeout 5s;
proxy_send_timeout 10s;
proxy_read_timeout 10s;
}
# Route product service requests
location /api/products/ {
proxy_pass http://product-service;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 5s;
proxy_send_timeout 10s;
proxy_read_timeout 10s;
}
# Health check endpoint
location /health {
return 200 "API Gateway is healthy\n";
add_header Content-Type text/plain;
}
# Default location for unmatched requests
location / {
return 404 "Service not found";
add_header Content-Type text/plain;
}
}
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
namespace: microservices
labels:
app: api-gateway
app.kubernetes.io/name: nginx
app.kubernetes.io/component: gateway
spec:
replicas: 2 # Why 2? High availability for the entry point
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
selector:
matchLabels:
app: api-gateway
template:
metadata:
labels:
app: api-gateway
version: v1
spec:
containers:
- name: nginx
image: nginx:1.25-alpine
ports:
- containerPort: 80
name: http
protocol: TCP
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
# Why health checks for gateway? Critical component needs monitoring
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
volumes:
- name: nginx-config
configMap:
name: nginx-config
---
apiVersion: v1
kind: Service
metadata:
name: api-gateway
namespace: microservices
labels:
app: api-gateway
app.kubernetes.io/name: nginx
app.kubernetes.io/component: gateway
spec:
selector:
app: api-gateway
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
type: LoadBalancer # Why LoadBalancer? Need external access for clients
Deploying the Application
Deployment Order Matters: We deploy dependencies first (database, cache) before the services that depend on them.
Apply all the manifests:
# Create namespace first
kubectl apply -f k8s/base/namespace.yaml
# Deploy infrastructure components (database, cache)
kubectl apply -f k8s/base/postgres.yaml
kubectl apply -f k8s/base/redis.yaml
# Wait for database to be ready - Why wait?
# Services will crash if they can't connect to dependencies
kubectl wait --for=condition=ready pod -l app=postgres -n microservices --timeout=300s
# Deploy application services
kubectl apply -f k8s/base/user-service.yaml
kubectl apply -f k8s/base/product-service.yaml
# Wait for services to be ready
kubectl wait --for=condition=ready pod -l app=user-service -n microservices --timeout=180s
kubectl wait --for=condition=ready pod -l app=product-service -n microservices --timeout=180s
# Deploy API gateway last - Why last?
# Gateway needs backend services to be ready for health checks
kubectl apply -f k8s/base/api-gateway.yaml
# Check deployment status
kubectl get all -n microservices
# Test the deployment
kubectl get svc api-gateway -n microservices
# Get the LoadBalancer IP and test: curl http://<EXTERNAL-IP>/health
Verification Commands:
# Check pod status
kubectl get pods -n microservices -o wide
# Check service endpoints
kubectl get endpoints -n microservices
# View pod logs if there are issues
kubectl logs -l app=user-service -n microservices --tail=50
# Test internal connectivity
kubectl run test-pod --image=busybox --rm -it --restart=Never -n microservices -- wget -qO- http://user-service:3001/health
Troubleshooting: If pods fail to start, common issues include:
- Image pull errors - Check image names and registry access
- Resource constraints - Check node capacity with
kubectl describe nodes
- Configuration errors - Validate YAML with
kubeval
orkubectl --dry-run=client
- Dependency issues - Ensure databases are ready before starting services
This foundation provides a solid base for understanding how Kubernetes orchestrates microservices. In the following sections, we’ll explore advanced topics like service discovery, configuration management, and production deployment strategies.
Service Discovery and Communication
Kubernetes provides built-in service discovery through DNS and environment variables, but understanding the mechanisms and best practices is crucial for reliable microservice communication.
How Kubernetes Service Discovery Works
When you create a Service in Kubernetes, the cluster automatically:
- Assigns a Cluster IP: A virtual IP address that load balances to healthy pods
- Creates DNS Records: Services become discoverable via DNS names
- Updates Environment Variables: Injects service endpoint information into pods
- Configures kube-proxy: Sets up network rules for traffic routing
DNS-Based Discovery
Kubernetes creates DNS records for services following the pattern:
<service-name>.<namespace>.svc.cluster.local
For our microservices:
# From any pod in the microservices namespace
nslookup user-service
# Returns: user-service.microservices.svc.cluster.local
# Short form works within the same namespace
curl http://user-service:3001/health
# Full form works across namespaces
curl http://user-service.microservices.svc.cluster.local:3001/health
Service Types and Use Cases
ClusterIP (Default): Internal-only access, perfect for microservice-to-microservice communication:
apiVersion: v1
kind: Service
metadata:
name: internal-service
spec:
type: ClusterIP # Default, can be omitted
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
NodePort: Exposes service on each node’s IP at a static port:
apiVersion: v1
kind: Service
metadata:
name: nodeport-service
spec:
type: NodePort
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
nodePort: 30080 # Optional, will be assigned if not specified
LoadBalancer: Provisions an external load balancer (cloud provider dependent):
apiVersion: v1
kind: Service
metadata:
name: loadbalancer-service
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
Advanced Service Discovery with Headless Services
Headless services (ClusterIP: None) return the IP addresses of individual pods rather than a single service IP. This is useful for:
- Database clustering
- StatefulSets that need to address specific pod instances
- Service meshes that handle their own load balancing
apiVersion: v1
kind: Service
metadata:
name: headless-service
spec:
clusterIP: None # Makes it headless
selector:
app: database-cluster
ports:
- port: 5432
Implementing Circuit Breakers and Retry Logic
For resilient microservice communication, implement circuit breakers and retry logic:
// Circuit breaker implementation in Node.js
class CircuitBreaker {
constructor(threshold = 5, timeout = 60000) {
this.threshold = threshold;
this.timeout = timeout;
this.failureCount = 0;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.nextAttempt = Date.now();
}
async call(fn) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit breaker is OPEN');
}
this.state = 'HALF_OPEN';
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failureCount++;
if (this.failureCount >= this.threshold) {
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.timeout;
}
}
}
// Usage in service communication
const circuitBreaker = new CircuitBreaker(3, 30000);
async function callUserService(userId) {
return circuitBreaker.call(async () => {
const response = await fetch(`http://user-service:3001/api/users/${userId}`);
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
return response.json();
});
}
Service Mesh Integration
For complex microservice architectures, consider a service mesh like Istio or Linkerd:
# Istio service entry for external service
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: external-payment-service
spec:
hosts:
- payment-api.external.com
ports:
- number: 443
name: https
protocol: HTTPS
location: MESH_EXTERNAL
resolution: DNS
---
# Virtual service for traffic routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: user-service-routing
spec:
hosts:
- user-service
http:
- match:
- headers:
version:
exact: v2
route:
- destination:
host: user-service
subset: v2
- route:
- destination:
host: user-service
subset: v1
Configuration Management and Secrets
Proper configuration management is essential for microservice deployments across different environments.
ConfigMaps for Non-Sensitive Configuration
ConfigMaps store configuration data as key-value pairs:
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: microservices
data:
# Simple key-value pairs
database.host: "postgres-service"
database.port: "5432"
redis.ttl: "3600"
# Configuration files
app.properties: |
spring.datasource.url=jdbc:postgresql://postgres-service:5432/ecommerce
spring.jpa.hibernate.ddl-auto=update
logging.level.root=INFO
nginx.conf: |
upstream backend {
server product-service:3002;
server user-service:3001 backup;
}
Secrets for Sensitive Data
Secrets store sensitive information with base64 encoding (not encryption):
apiVersion: v1
kind: Secret
metadata:
name: database-credentials
namespace: microservices
type: Opaque
data:
username: cG9zdGdyZXM= # postgres
password: c2VjdXJlUGFzcw== # securePass
---
# TLS secret for HTTPS
apiVersion: v1
kind: Secret
metadata:
name: tls-secret
namespace: microservices
type: kubernetes.io/tls
data:
tls.crt: LS0tLS1CRUdJTi... # base64 encoded certificate
tls.key: LS0tLS1CRUdJTi... # base64 encoded private key
Create secrets imperatively:
# From literal values
kubectl create secret generic api-key \
--from-literal=key=abc123 \
--namespace=microservices
# From files
kubectl create secret generic ssl-certs \
--from-file=tls.crt=./server.crt \
--from-file=tls.key=./server.key \
--namespace=microservices
# Docker registry secret
kubectl create secret docker-registry regcred \
--docker-server=your-registry.com \
--docker-username=your-username \
--docker-password=your-password \
--namespace=microservices
Using ConfigMaps and Secrets in Deployments
apiVersion: apps/v1
kind: Deployment
metadata:
name: configurable-service
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
env:
# Single values from ConfigMap
- name: DATABASE_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: database.host
# Single values from Secret
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: database-credentials
key: password
# All ConfigMap keys as environment variables
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: database-credentials
volumeMounts:
# Mount ConfigMap as files
- name: config-volume
mountPath: /etc/config
# Mount Secret as files
- name: secret-volume
mountPath: /etc/secrets
readOnly: true
volumes:
- name: config-volume
configMap:
name: app-config
- name: secret-volume
secret:
secretName: database-credentials
defaultMode: 0400 # Read-only for owner
Environment-Specific Configuration with Kustomize
Kustomize enables environment-specific configuration management:
# k8s/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- postgres.yaml
- user-service.yaml
- product-service.yaml
- api-gateway.yaml
commonLabels:
app.kubernetes.io/version: v1.0.0
app.kubernetes.io/managed-by: kustomize
# k8s/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namePrefix: prod-
namespace: microservices-prod
resources:
- ../../base
patchesStrategicMerge:
- replica-count.yaml
- resource-limits.yaml
configMapGenerator:
- name: app-config
literals:
- database.host=prod-postgres-service
- log.level=ERROR
secretGenerator:
- name: database-credentials
literals:
- username=produser
- password=prodpassword
images:
- name: your-registry/user-service
newTag: v1.2.3
- name: your-registry/product-service
newTag: v1.2.3
Deploy with Kustomize:
# Apply base configuration
kubectl apply -k k8s/base/
# Apply production overlay
kubectl apply -k k8s/overlays/production/
Persistent Storage in Microservices
Stateful services require persistent storage that survives pod restarts and rescheduling.
Storage Classes and Provisioning
Define storage classes for different storage types:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs # AWS EBS
parameters:
type: gp3
iops: "3000"
throughput: "125"
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: slow-hdd
provisioner: kubernetes.io/aws-ebs
parameters:
type: sc1
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
Persistent Volume Claims
Request storage for your applications:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: database-storage
namespace: microservices
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 50Gi
---
# For shared storage (NFS, etc.)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-storage
namespace: microservices
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-storage
resources:
requests:
storage: 100Gi
StatefulSets for Ordered Deployment
StatefulSets provide stable network identities and persistent storage:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-cluster
namespace: microservices
spec:
serviceName: redis-cluster
replicas: 3
selector:
matchLabels:
app: redis-cluster
template:
metadata:
labels:
app: redis-cluster
spec:
containers:
- name: redis
image: redis:6.2-alpine
ports:
- containerPort: 6379
- containerPort: 16379
command:
- redis-server
- /conf/redis.conf
volumeMounts:
- name: redis-data
mountPath: /data
- name: redis-config
mountPath: /conf
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
volumes:
- name: redis-config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 10Gi
Backup and Disaster Recovery
Implement backup strategies for persistent data:
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
namespace: microservices
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:13
command:
- sh
- -c
- |
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
pg_dump -h postgres-service -U postgres -d ecommerce > /backup/backup_$TIMESTAMP.sql
# Upload to S3 or other storage
aws s3 cp /backup/backup_$TIMESTAMP.sql s3://backup-bucket/postgres/
# Clean up old local backups
find /backup -name "backup_*.sql" -mtime +7 -delete
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: postgres-password
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure
Monitoring and Observability
Comprehensive monitoring is crucial for maintaining microservice health and performance.
Prometheus and Grafana Setup
Deploy monitoring stack using Helm:
# Add Prometheus helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install kube-prometheus-stack
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--values monitoring-values.yaml
Custom monitoring values:
# monitoring-values.yaml
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: fast-ssd
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
grafana:
adminPassword: admin123
persistence:
enabled: true
storageClassName: fast-ssd
size: 10Gi
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: fast-ssd
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
Application Metrics with Prometheus
Instrument your Node.js services with Prometheus metrics:
// metrics.js
const promClient = require('prom-client');
// Create a Registry to register the metrics
const register = new promClient.Registry();
// Add default metrics
promClient.collectDefaultMetrics({
app: 'user-service',
timeout: 10000,
gcDurationBuckets: [0.001, 0.01, 0.1, 1, 2, 5],
register
});
// Custom metrics
const httpRequestsTotal = new promClient.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register]
});
const httpRequestDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10],
registers: [register]
});
const activeConnections = new promClient.Gauge({
name: 'active_connections',
help: 'Number of active connections',
registers: [register]
});
// Middleware for Express
function metricsMiddleware(req, res, next) {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
const route = req.route ? req.route.path : req.url;
httpRequestsTotal.inc({
method: req.method,
route: route,
status_code: res.statusCode
});
httpRequestDuration.observe(
{
method: req.method,
route: route,
status_code: res.statusCode
},
duration
);
});
next();
}
// Metrics endpoint
function metricsEndpoint(req, res) {
res.set('Content-Type', register.contentType);
res.end(register.metrics());
}
module.exports = {
register,
httpRequestsTotal,
httpRequestDuration,
activeConnections,
metricsMiddleware,
metricsEndpoint
};
Use in your application:
// app.js
const express = require('express');
const { metricsMiddleware, metricsEndpoint } = require('./metrics');
const app = express();
// Add metrics middleware
app.use(metricsMiddleware);
// Metrics endpoint for Prometheus scraping
app.get('/metrics', metricsEndpoint);
// Your regular routes
app.get('/health', (req, res) => {
res.json({ status: 'healthy' });
});
// ... rest of your application
ServiceMonitor for Prometheus
Configure Prometheus to scrape your services:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: microservices-monitor
namespace: monitoring
labels:
release: monitoring # Must match Prometheus selector
spec:
selector:
matchLabels:
monitoring: enabled
endpoints:
- port: http
path: /metrics
interval: 30s
namespaceSelector:
matchNames:
- microservices
Add monitoring labels to your services:
apiVersion: v1
kind: Service
metadata:
name: user-service
namespace: microservices
labels:
app: user-service
monitoring: enabled # Required for ServiceMonitor
spec:
selector:
app: user-service
ports:
- name: http
port: 3001
targetPort: 3001
Distributed Tracing with Jaeger
Deploy Jaeger for distributed tracing:
# Install Jaeger operator
kubectl create namespace observability
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.29.0/jaeger-operator.yaml -n observability
# Deploy Jaeger instance
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simplest
namespace: observability
spec:
strategy: production
storage:
type: elasticsearch
elasticsearch:
nodeCount: 3
redundancyPolicy: SingleRedundancy
storage:
storageClassName: fast-ssd
size: 50Gi
EOF
Instrument services with OpenTelemetry:
// tracing.js
const { NodeTracerProvider } = require('@opentelemetry/sdk-node');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express');
// Initialize the tracer provider
const provider = new NodeTracerProvider({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: process.env.SERVICE_NAME || 'unknown-service',
[SemanticResourceAttributes.SERVICE_VERSION]: process.env.SERVICE_VERSION || '1.0.0',
}),
});
// Configure Jaeger exporter
const jaegerExporter = new JaegerExporter({
endpoint: process.env.JAEGER_ENDPOINT || 'http://simplest-collector.observability:14268/api/traces',
});
provider.addSpanProcessor(new BatchSpanProcessor(jaegerExporter));
provider.register();
// Register instrumentations
registerInstrumentations({
instrumentations: [
new HttpInstrumentation(),
new ExpressInstrumentation(),
],
});
console.log('Tracing initialized');
Log Aggregation with ELK Stack
Deploy Elasticsearch, Logstash, and Kibana for log aggregation:
# elasticsearch.yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
namespace: logging
spec:
version: 7.15.0
nodeSets:
- name: default
count: 3
config:
node.store.allow_mmap: false
podTemplate:
spec:
containers:
- name: elasticsearch
resources:
requests:
memory: 2Gi
cpu: 1
limits:
memory: 4Gi
cpu: 2
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: fast-ssd
---
# kibana.yaml
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana
namespace: logging
spec:
version: 7.15.0
count: 1
elasticsearchRef:
name: elasticsearch
http:
service:
spec:
type: LoadBalancer
Use Fluent Bit for log collection:
# fluent-bit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 50MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude Off
[OUTPUT]
Name es
Match kube.*
Host elasticsearch-es-http.logging
Port 9200
HTTP_User elastic
HTTP_Passwd ${ELASTICSEARCH_PASSWORD}
Index kubernetes-logs
Type _doc
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
Production Deployment Strategies
Moving from development to production requires careful consideration of deployment strategies, scaling, and reliability.
Blue-Green Deployments
Blue-green deployment maintains two identical production environments:
# blue-green-deployment.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: user-service-rollout
namespace: microservices
spec:
replicas: 5
strategy:
blueGreen:
activeService: user-service-active
previewService: user-service-preview
autoPromotionEnabled: false
scaleDownDelaySeconds: 30
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: user-service-preview
postPromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: user-service-active
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
containers:
- name: user-service
image: your-registry/user-service:latest
ports:
- containerPort: 3001
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "400m"
---
apiVersion: v1
kind: Service
metadata:
name: user-service-active
namespace: microservices
spec:
selector:
app: user-service
ports:
- port: 3001
targetPort: 3001
---
apiVersion: v1
kind: Service
metadata:
name: user-service-preview
namespace: microservices
spec:
selector:
app: user-service
ports:
- port: 3001
targetPort: 3001
Canary Deployments
Gradually roll out changes to a subset of users:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: product-service-rollout
namespace: microservices
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 20
- pause: {}
- setWeight: 40
- pause: {duration: 10}
- setWeight: 60
- pause: {duration: 10}
- setWeight: 80
- pause: {duration: 10}
canaryService: product-service-canary
stableService: product-service-stable
trafficRouting:
istio:
virtualService:
name: product-service-vs
routes:
- primary
selector:
matchLabels:
app: product-service
template:
metadata:
labels:
app: product-service
spec:
containers:
- name: product-service
image: your-registry/product-service:latest
ports:
- containerPort: 3002
Horizontal Pod Autoscaler (HPA)
Automatically scale based on metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: user-service-hpa
namespace: microservices
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Vertical Pod Autoscaler (VPA)
Automatically adjust resource requests and limits:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: product-service-vpa
namespace: microservices
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: product-service
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: product-service
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 1
memory: 1Gi
controlledResources: ["cpu", "memory"]
Multi-Cluster Deployments
For high availability across regions:
# cluster-1-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: microservices
labels:
cluster: us-east-1
spec:
replicas: 3
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
cluster: us-east-1
spec:
containers:
- name: user-service
image: your-registry/user-service:v1.2.3
env:
- name: CLUSTER_REGION
value: "us-east-1"
- name: DATABASE_REPLICA
value: "read-replica-east"
---
# cluster-2-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: microservices
labels:
cluster: us-west-2
spec:
replicas: 3
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
cluster: us-west-2
spec:
containers:
- name: user-service
image: your-registry/user-service:v1.2.3
env:
- name: CLUSTER_REGION
value: "us-west-2"
- name: DATABASE_REPLICA
value: "read-replica-west"
Security Best Practices
Security must be built into every layer of your Kubernetes microservice architecture.
Pod Security Standards
Implement Pod Security Standards to enforce security policies:
apiVersion: v1
kind: Namespace
metadata:
name: microservices
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Security Contexts
Configure security contexts for containers:
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-service
spec:
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: your-app:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- name: tmp
mountPath: /tmp
- name: var-run
mountPath: /var/run
volumes:
- name: tmp
emptyDir: {}
- name: var-run
emptyDir: {}
Network Policies
Control traffic flow between pods:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: microservices-network-policy
namespace: microservices
spec:
podSelector:
matchLabels:
app: user-service
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: api-gateway
- namespaceSelector:
matchLabels:
name: monitoring
podSelector:
matchLabels:
app: prometheus
ports:
- protocol: TCP
port: 3001
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to: [] # Allow DNS
ports:
- protocol: UDP
port: 53
RBAC (Role-Based Access Control)
Implement least privilege access:
apiVersion: v1
kind: ServiceAccount
metadata:
name: microservice-sa
namespace: microservices
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: microservices
name: microservice-role
rules:
- apiGroups: [""]
resources: ["secrets", "configmaps"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: microservice-rolebinding
namespace: microservices
subjects:
- kind: ServiceAccount
name: microservice-sa
namespace: microservices
roleRef:
kind: Role
name: microservice-role
apiGroup: rbac.authorization.k8s.io
---
# Use in deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
spec:
template:
spec:
serviceAccountName: microservice-sa
containers:
- name: user-service
image: your-registry/user-service:latest
Secret Management with External Secrets
Use external secret management systems:
# Install External Secrets Operator
helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets -n external-secrets-system --create-namespace
# SecretStore for AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secrets-manager
namespace: microservices
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
secretRef:
accessKeyID:
name: aws-credentials
key: access-key-id
secretAccessKey:
name: aws-credentials
key: secret-access-key
---
# External Secret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: microservices
spec:
refreshInterval: 15s
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: database-credentials
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: prod/database
property: username
- secretKey: password
remoteRef:
key: prod/database
property: password
Image Security Scanning
Implement image security scanning in your CI/CD pipeline:
# .github/workflows/security-scan.yml
name: Security Scan
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build image
run: docker build -t user-service:$ ./user-service
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'user-service:$'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload Trivy scan results to GitHub Security tab
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
Troubleshooting Common Issues
Understanding how to diagnose and resolve common Kubernetes issues is essential for maintaining microservice deployments.
Pod Issues
Pod stuck in Pending state:
# Check pod details
kubectl describe pod <pod-name> -n microservices
# Common causes and solutions:
# 1. Insufficient resources
kubectl get nodes
kubectl describe node <node-name>
# 2. Image pull issues
kubectl get events -n microservices --sort-by='.lastTimestamp'
# 3. Volume mount issues
kubectl get pv,pvc -n microservices
Pod CrashLoopBackOff:
# Check pod logs
kubectl logs <pod-name> -n microservices --previous
# Check resource limits
kubectl describe pod <pod-name> -n microservices
# Debug with temporary container
kubectl run debug-pod --image=busybox --rm -it --restart=Never -- /bin/sh
Service Discovery Issues
Services not reachable:
# Test DNS resolution
kubectl run test-pod --image=busybox --rm -it --restart=Never -- nslookup user-service.microservices.svc.cluster.local
# Check service endpoints
kubectl get endpoints user-service -n microservices
# Test connectivity
kubectl run test-pod --image=busybox --rm -it --restart=Never -- wget -qO- http://user-service:3001/health
Storage Issues
PVC stuck in Pending:
# Check storage classes
kubectl get storageclass
# Check PVC details
kubectl describe pvc <pvc-name> -n microservices
# Check available PVs
kubectl get pv
Networking Issues
Network policies blocking traffic:
# List network policies
kubectl get networkpolicy -n microservices
# Test connectivity without network policies
kubectl label namespace microservices networkpolicy.test=disabled
# Check CNI plugin logs
kubectl logs -n kube-system -l k8s-app=calico-node
Resource Issues
Out of Memory (OOMKilled):
# Check resource usage
kubectl top pods -n microservices
# Check resource limits
kubectl describe pod <pod-name> -n microservices
# View OOM events
kubectl get events -n microservices --field-selector reason=OOMKilling
Debugging Techniques
Interactive debugging:
# Exec into running container
kubectl exec -it <pod-name> -n microservices -- /bin/bash
# Debug with ephemeral container (Kubernetes 1.23+)
kubectl debug <pod-name> -n microservices -it --image=busybox --target=<container-name>
# Copy files from/to pod
kubectl cp <pod-name>:/path/to/file ./local-file -n microservices
kubectl cp ./local-file <pod-name>:/path/to/file -n microservices
Port forwarding for local testing:
# Forward service port to local machine
kubectl port-forward service/user-service 8080:3001 -n microservices
# Forward pod port directly
kubectl port-forward pod/<pod-name> 8080:3001 -n microservices
Advanced Patterns and Next Steps
As your Kubernetes microservice architecture matures, consider these advanced patterns and technologies.
Service Mesh with Istio
Service mesh provides advanced traffic management, security, and observability:
# Install Istio
curl -L https://istio.io/downloadIstio | sh -
cd istio-*
export PATH=$PWD/bin:$PATH
istioctl install --set values.defaultRevision=default
# Enable injection for namespace
kubectl label namespace microservices istio-injection=enabled
Advanced traffic management:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: user-service-vs
namespace: microservices
spec:
hosts:
- user-service
http:
- match:
- headers:
user-type:
exact: premium
route:
- destination:
host: user-service
subset: v2
weight: 100
- route:
- destination:
host: user-service
subset: v1
weight: 80
- destination:
host: user-service
subset: v2
weight: 20
fault:
delay:
percentage:
value: 0.1
fixedDelay: 5s
timeout: 10s
retries:
attempts: 3
perTryTimeout: 2s
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: user-service-dr
namespace: microservices
spec:
host: user-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
maxRequestsPerConnection: 2
loadBalancer:
simple: LEAST_CONN
outlierDetection:
consecutiveErrors: 3
interval: 30s
baseEjectionTime: 30s
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
GitOps with ArgoCD
Implement GitOps for declarative deployments:
# Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Get admin password
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
ArgoCD Application:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: microservices-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/your-org/k8s-microservices
targetRevision: HEAD
path: k8s/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: microservices
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Chaos Engineering with Chaos Mesh
Test system resilience:
# Install Chaos Mesh
curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash
Chaos experiments:
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
name: user-service-failure
namespace: microservices
spec:
action: pod-failure
mode: one
duration: "30s"
selector:
namespaces:
- microservices
labelSelectors:
app: user-service
scheduler:
cron: "@every 10m"
---
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
name: network-delay
namespace: microservices
spec:
action: delay
mode: all
selector:
namespaces:
- microservices
labelSelectors:
app: product-service
delay:
latency: "100ms"
correlation: "100"
jitter: "0ms"
duration: "5m"
Event-Driven Architecture with NATS
Implement event-driven communication:
# Install NATS
helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm install nats nats/nats
Event producer service:
// event-producer.js
const { connect, StringCodec } = require('nats');
class EventProducer {
constructor() {
this.nc = null;
this.sc = StringCodec();
}
async connect() {
this.nc = await connect({
servers: 'nats://nats:4222'
});
console.log('Connected to NATS');
}
async publishUserEvent(eventType, userId, data) {
const event = {
id: generateId(),
type: eventType,
userId: userId,
timestamp: new Date().toISOString(),
data: data
};
await this.nc.publish(`user.${eventType}`, this.sc.encode(JSON.stringify(event)));
console.log(`Published event: user.${eventType}`);
}
async close() {
await this.nc.close();
}
}
// Usage in user service
const eventProducer = new EventProducer();
await eventProducer.connect();
// After user registration
await eventProducer.publishUserEvent('registered', user.id, {
email: user.email,
name: user.name
});
Event consumer service:
// event-consumer.js
const { connect, StringCodec } = require('nats');
class EventConsumer {
constructor() {
this.nc = null;
this.sc = StringCodec();
}
async connect() {
this.nc = await connect({
servers: 'nats://nats:4222'
});
console.log('Connected to NATS');
}
async subscribeToUserEvents() {
const sub = this.nc.subscribe('user.*');
for await (const msg of sub) {
try {
const event = JSON.parse(this.sc.decode(msg.data));
await this.handleUserEvent(event);
} catch (error) {
console.error('Error processing event:', error);
}
}
}
async handleUserEvent(event) {
switch (event.type) {
case 'registered':
await this.sendWelcomeEmail(event.data.email);
break;
case 'updated':
await this.updateUserProfile(event.userId, event.data);
break;
default:
console.log(`Unknown event type: ${event.type}`);
}
}
}
Conclusion
Kubernetes provides a powerful foundation for deploying and managing microservice architectures at scale. Throughout this comprehensive guide, we’ve explored:
- Core Concepts: Understanding Pods, Services, Deployments, and other fundamental resources
- Practical Implementation: Building and deploying a realistic microservice architecture
- Production Readiness: Implementing monitoring, security, and deployment strategies
- Advanced Patterns: Service mesh, GitOps, and event-driven architectures
Key Takeaways
Start Simple: Begin with basic Kubernetes concepts and gradually introduce complexity as your understanding and requirements grow.
Embrace Declarative Configuration: Use YAML manifests and tools like Kustomize to manage configuration across environments.
Implement Observability Early: Set up monitoring, logging, and tracing from the beginning to understand system behavior.
Security is Paramount: Apply security best practices at every layer, from container images to network policies.
Automate Everything: Use CI/CD pipelines, GitOps, and infrastructure as code to minimize manual operations.
Next Steps
As you continue your Kubernetes journey:
- Practice with Real Workloads: Deploy actual applications to understand practical challenges
- Join the Community: Participate in Kubernetes forums, conferences, and local meetups
- Stay Updated: Kubernetes evolves rapidly; follow release notes and best practices
- Explore Ecosystem: Investigate tools like Helm, Istio, and various monitoring solutions
- Consider Certification: Pursue Kubernetes certifications (CKA, CKAD, CKS) to validate your skills
The microservice architecture pattern, combined with Kubernetes orchestration, provides a powerful foundation for building scalable, resilient applications. While the initial learning curve is steep, the operational benefits and architectural flexibility make it worthwhile for most modern applications.
Remember that technology is just one aspect of successful microservice deployments. Pay equal attention to organizational structure, team communication, and operational practices to fully realize the benefits of this architectural approach.
This guide provides a comprehensive introduction to Kubernetes microservice deployment. For the latest information and updates, always refer to the official Kubernetes documentation and community resources.
- Join the Community: Participate in Kubernetes forums, conferences, and local meetups
- Stay Updated: Kubernetes evolves rapidly; follow release notes and best practices
- Explore Ecosystem: Investigate tools like Helm, Istio, and various monitoring solutions
- Consider Certification: Pursue Kubernetes certifications (CKA, CKAD, CKS) to validate your skills
The microservice architecture pattern, combined with Kubernetes orchestration, provides a powerful foundation for building scalable, resilient applications. While the initial learning curve is steep, the operational benefits and architectural flexibility make it worthwhile for most modern applications.
Remember that technology is just one aspect of successful microservice deployments. Pay equal attention to organizational structure, team communication, and operational practices to fully realize the benefits of this architectural approach.
This guide provides a comprehensive introduction to Kubernetes microservice deployment. For the latest information and updates, always refer to the official Kubernetes documentation and community resources.