System Design
System design involves designing and architecting complex software systems that are scalable, reliable, and maintainable.
What is System Design?
System design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements.
Key Concepts
Scalability
Ability of a system to handle growing amounts of load.
Types:
- Vertical Scaling: Increasing resources of a single machine
- Horizontal Scaling: Adding more machines to distribute load
Reliability
Probability that a system will function correctly over time.
Techniques:
- Redundancy
- Failover mechanisms
- Load balancing
- Circuit breakers
Availability
Percentage of time the system is operational.
High Availability (HA):
- 99.9% = 8.76 hours downtime/year
- 99.99% = 52.56 minutes downtime/year
- 99.999% = 5.26 minutes downtime/year
Performance
System's responsiveness and throughput under various conditions.
Metrics:
- Latency: Time to process a request
- Throughput: Requests per second
- Response Time: Total time for request completion
System Design Process
1. Requirements Gathering
Understand functional and non-functional requirements.
Functional Requirements:
- What the system should do
- User stories and use cases
- Business logic
Non-Functional Requirements:
- Performance targets
- Scalability needs
- Security requirements
- Availability goals
2. Estimation
Calculate storage, bandwidth, and traffic requirements.
Storage Estimation:
# Example: Photo sharing service
users_per_day = 1000
photos_per_user = 3
photo_size_mb = 2
retention_days = 365
daily_storage = users_per_day * photos_per_user * photo_size_mb
yearly_storage = daily_storage * retention_days
Traffic Estimation:
- Read/write ratio
- Peak traffic patterns
- Growth projections
3. High-Level Design
Create overall system architecture.
Components:
- Load balancers
- Web servers
- Application servers
- Databases
- Caching layers
- Message queues
4. Detailed Design
Design individual components and their interactions.
Consider:
- Data models
- API design
- Database schema
- Caching strategy
- Security measures
5. Identify Bottlenecks
Find potential performance limitations.
Common Bottlenecks:
- Database queries
- Network latency
- Memory usage
- I/O operations
Common Architectural Patterns
Microservices Architecture
┌──────┐ ┌──────┐ ┌──────┐
│Service│ │Service│ │Service│
│ A │ │ B │ │ C │
└───┬──┘ └───┬──┘ └───┬──┘
│ │ │
└──────────┼──────────┘
│
┌──────┴──────┐
│ API Gateway │
└──────┬──────┘
│
┌──────┴──────┐
│ Client │
└─────────────┘
Advantages:
- Independent scaling
- Technology diversity
- Fault isolation
- Team autonomy
Challenges:
- Network complexity
- Distributed transactions
- Service discovery
- Monitoring complexity
Event-Driven Architecture
┌──────┐ ┌──────┐ ┌──────┐
│Event │ │Event │ │Event │
│Source│───►│ Bus │───►│Handler│
└──────┘ └──────┘ └──────┘
Components:
- Event producers
- Event bus/broker
- Event consumers
- Event stores
Serverless Architecture
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Function │ │ Function │ │ Function │
│ Service │ │ Service │ │ Service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└────────────────┼────────────────┘
│
┌───────┴───────┐
│ Cloud │
│ Provider │
└───────────────┘
Benefits:
- No server management
- Automatic scaling
- Pay-per-use
- Reduced operational complexity
Database Design
SQL vs NoSQL
| Factor | SQL | NoSQL |
|--------|-----|-------|
| Schema | Fixed | Flexible |
| Scaling | Vertical | Horizontal |
| Consistency | Strong | Eventual |
| Query Language | SQL | Various |
| Use Case | Structured data | Unstructured data |
Database Patterns
Read Replicas:
- Primary database for writes
- Multiple replicas for reads
- Improves read performance
Sharding:
- Partition data across multiple databases
- Horizontal scaling
- Complex to implement
CQRS:
- Command Query Responsibility Segregation
- Separate models for read and write
- Optimized for different access patterns
Caching Strategies
Cache-Aside Pattern
def get_user(user_id):
# Check cache first
user = cache.get(f"user:{user_id}")
if user:
return user
# Load from database
user = db.get_user(user_id)
if user:
cache.set(f"user:{user_id}", user, ttl=3600)
return user
Write-Through Cache
- Write to cache and database simultaneously
- Ensures cache consistency
- Higher write latency
Write-Behind Cache
- Write to cache immediately
- Asynchronously write to database
- Better write performance
- Risk of data loss
Load Balancing
Algorithms
- Round Robin: Sequential distribution
- Least Connections: Route to least busy server
- IP Hash: Consistent routing based on IP
- Weighted Round Robin: Consider server capacity
Types
- Layer 4: Transport layer (TCP/UDP)
- Layer 7: Application layer (HTTP)
Security Considerations
Authentication & Authorization
- OAuth 2.0
- JWT tokens
- Role-based access control (RBAC)
- Multi-factor authentication
Data Protection
- Encryption at rest
- Encryption in transit
- Data masking
- Access logging
Network Security
- Firewalls
- DDoS protection
- Rate limiting
- SSL/TLS certificates
Monitoring and Observability
The Three Pillars
- Logs: Event records
- Metrics: Numerical measurements
- Traces: Request flows
Key Metrics
- SLA: Service Level Agreement compliance
- SLO: Service Level Objective
- SLI: Service Level Indicator
- Error Rate: Percentage of failed requests
- Latency: Response time percentiles
Design Examples
URL Shortener
Requirements:
- Generate short URLs
- Redirect to original URLs
- Handle high traffic
- Analytics and tracking
Design:
- Hash function for URL generation
- Distributed cache for hot URLs
- Database for persistence
- CDN for global performance
Messaging System
Requirements:
- Real-time messaging
- Group chats
- Message history
- Online status
Design:
- WebSocket connections
- Message queues
- NoSQL for messages
- Redis for online status
Best Practices
- Start Simple: Begin with monolith, evolve as needed
- Design for Failure: Assume components will fail
- Measure Everything: Collect comprehensive metrics
- Automate Everything: Deployment, testing, monitoring
- Security First: Design with security in mind
- Document Decisions: Record architectural choices
- Review Regularly: Architecture should evolve
System design is both an art and science, requiring balance between competing requirements and constraints.