Post

Even ๐—ก๐—ฉ๐—œ๐——๐—œ๐—” ๐—ก๐—œ๐—  ๐—ถ๐˜€ ๐—ฟ๐˜‚๐—ป๐—ป๐—ถ๐—ป๐—ด ๐—ผ๐—ป ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€, what is ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€?

 Kubernetes for Machine Learning

Even ๐—ก๐—ฉ๐—œ๐——๐—œ๐—” ๐—ก๐—œ๐—  ๐—ถ๐˜€ ๐—ฟ๐˜‚๐—ป๐—ป๐—ถ๐—ป๐—ด ๐—ผ๐—ป ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€, what is ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€ and why is it worth learning ๐—ฎ๐˜€ ๐— ๐—Ÿ๐—ข๐—ฝ๐˜€/๐— ๐—Ÿ/๐——๐—ฎ๐˜๐—ฎ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ?

๐Ÿ‘‰ Today we look into the Kubernetes system from a birdโ€™s eye view.

๐—ฆ๐—ผ, ๐˜„๐—ต๐—ฎ๐˜ ๐—ถ๐˜€ ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€ (๐—ž๐Ÿด๐˜€)?

  1. It is a container orchestrator that performs the scheduling, running and recovery of your containerised applications in a horizontally scalable and self-healing way.

Kubernetes architecture consists of two main logical groups:

  1. Control plane - this is where K8s system processes that are responsible for scheduling workloads defined by you and keeping the system healthy live.
  2. Worker nodes - this is where containers are scheduled and run.

๐—›๐—ผ๐˜„ ๐—ฑ๐—ผ๐—ฒ๐˜€ ๐—ž๐˜‚๐—ฏ๐—ฒ๐—ฟ๐—ป๐—ฒ๐˜๐—ฒ๐˜€ ๐—ต๐—ฒ๐—น๐—ฝ ๐˜†๐—ผ๐˜‚?

  1. You can have thousands of Nodes (usually you only need tens of them) in your K8s cluster, each of them can host multiple containers. Nodes can be added or removed from the cluster as needed. This enables unrivaled horizontal scalability.
  2. Kubernetes provides an easy to use and understand declarative interface to deploy applications. Your application deployment definition can be described in yaml, submitted to the cluster and the system will take care that the desired state of the application is always up to date.
  3. Users are empowered to create and own their application architecture in boundaries pre-defined by Cluster Administrators.
  • โœ… In most cases you can deploy multiple types of ML Applications into a single cluster, you donโ€™t need to care about which server to deploy to - K8s will take care of it.
  • โœ… You can request different amounts of dedicated machine resources per application.
  • โœ… If your application goes down - K8s will make sure that a desired number of replicas is always alive.
  • โœ… You can roll out new versions of the running application using multiple strategies - K8s will safely do it for you.
  • โœ… You can expose your ML Services for other Product Apps to use with few intuitive resource definitions.
  • โœ… โ€ฆ

โ—๏ธHaving said this, while it is a bliss to use, usually the operation of Kubernetes clusters is what is feared. It is a complex system.

โ—๏ธMaster Plane is an overhead, you need it even if you want to deploy a single small application.


 When to use kubernetes for Machine Learning

When Should You Use Kubernetes for ML?

Retrieve: Decision framework for Kubernetes adoption.

graph TD
    A[Need Kubernetes?] --> B{User Traffic?}
    B -->|Low <100 req/s| C[Use EC2 Instance]
    B -->|High >100 req/s| D{ML Teams?}
    
    D -->|Few <12 people| E[Consider Simpler Solution]
    D -->|Many >12 people| F{Complexity?}
    
    F -->|Low| E
    F -->|High| G[Use Kubernetes]
    
    G --> H[Multiple Teams]
    G --> I[Microservices]
    G --> J[Complex Pipelines]
    
    style A fill:#e1f5ff
    style G fill:#d4edda
    style C fill:#fff3cd

Decision Criteria:

FactorLow ComplexityHigh Complexity
User Traffic<100 requests/second>100 requests/second
ML Teams<12 people>12 people
ApplicationsMonolithicMicroservices
InfrastructureSimpleComplex pipelines

When NOT to Use Kubernetes:

  • โŒ Low user traffic (<100 req/s) - Use EC2 instances
  • โŒ Small team (<12 people) - Simpler solutions suffice
  • โŒ Simple applications - Monolithic architecture works
  • โŒ Single application - Overhead not justified

When to Use Kubernetes:

  • โœ… High traffic requiring auto-scaling
  • โœ… Multiple ML teams with independent deployments
  • โœ… Complex microservices architecture
  • โœ… Need for resource isolation and management
  • โœ… Multiple applications requiring orchestration

Key Insight: Most likely not! When a technology is hot, thereโ€™s a tendency to disregard why the tool is useful. Very few companies have the complexity that justifies Kubernetes. If you have fewer than 12 people deploying models, consider simpler solutions.


 Kubernetes Scaling Strategies

Kubernetes Scaling Strategies

Innovate: Multiple scaling approaches for different needs.

graph TB
    A[Kubernetes Scaling] --> B[Horizontal Pod Autoscaling]
    A --> C[Vertical Pod Autoscaling]
    A --> D[Cluster Autoscaling]
    A --> E[Manual Scaling]
    A --> F[Predictive Scaling]
    A --> G[Custom Metrics Scaling]
    
    style A fill:#e1f5ff
    style B fill:#fff3cd
    style C fill:#d4edda
    style D fill:#f8d7da

Scaling Strategies Comparison

StrategyFunctionTriggerUse Case
HPAAdjust pod replicasCPU/Memory metricsVariable load
VPAAdjust resource limitsResource usageRight-sizing
Cluster AutoscalingAdjust node countPending podsCapacity management
Manual ScalingManual replica adjustmentUser commandPlanned changes
Predictive ScalingML-based predictionForecasted loadProactive scaling
Custom MetricsApplication-specificCustom metricsBusiness metrics

HPA Workflow:

graph LR
    A[Metrics Server] --> B[API Server]
    B --> C[HPA Controller]
    C --> D{CPU/Memory > Threshold?}
    D -->|Yes| E[Scale Up Pods]
    D -->|No| F{CPU/Memory < Threshold?}
    F -->|Yes| G[Scale Down Pods]
    F -->|No| H[Maintain Current]
    
    style A fill:#e1f5ff
    style C fill:#fff3cd
    style E fill:#d4edda

Detailed Scaling Strategies:

  1. Horizontal Pod Autoscaling (HPA)
    • Function: Adjusts the number of pod replicas based on CPU/memory usage or other select metrics
    • Workflow: Metrics Server โ†’ API Server โ†’ HPA Controller โ†’ Scale pods
  2. Vertical Pod Autoscaling (VPA)
    • Function: Adjusts resource limits and requests (CPU/memory) for containers within pods
    • Workflow: Metrics Server โ†’ API Server โ†’ VPA Controller โ†’ Adjust resources
  3. Cluster Autoscaling
    • Function: Adjusts the number of nodes in the cluster to ensure pods can be scheduled
    • Workflow: Scheduler โ†’ Cluster Autoscaler โ†’ Add/remove nodes
  4. Manual Scaling
    • Function: Manually adjusts the number of pod replicas
    • Workflow: kubectl command โ†’ API Server โ†’ Adjust pod count
  5. Predictive Scaling
    • Function: Uses ML models to predict future workloads and scales resources proactively
    • Workflow: ML Forecast โ†’ KEDA โ†’ Cluster Controller โ†’ Scale resources
  6. Custom Metrics Based Scaling
    • Function: Scales pods based on custom application-specific metrics
    • Workflow: Custom Metrics Server โ†’ HPA Controller โ†’ Scale deployment

Key Takeaways

Retrieve: Kubernetes is a powerful container orchestrator that provides horizontal scalability, self-healing, and declarative deployment for ML applications.

Innovate: Use Kubernetes when you have high traffic, multiple teams, and complex microservices. For simpler scenarios, consider EC2 instances or simpler container solutions.

Curiosity โ†’ Retrieve โ†’ Innovation: Start with curiosity about container orchestration, retrieve knowledge of Kubernetes architecture and scaling strategies, and innovate by applying it to your ML infrastructure when complexity justifies the overhead.

 Kubernetes Scaling Strategies

This post is licensed under CC BY 4.0 by the author.