Post

Combining Rust and Python for High-Performance AI Systems - Bridging Speed and Productivity

๐Ÿค” Curiosity: Can We Have Both Speed and Productivity in AI Systems?

After 8 years of building AI systems in game development at NC SOFT and COM2US, Iโ€™ve constantly faced the same dilemma: Pythonโ€™s ecosystem is unbeatable for prototyping and research, but production demands performance that Python struggles to deliver.

Python powers most AI and machine learning workflows. With its rich ecosystem โ€” from TensorFlow and PyTorch to scikit-learn and Hugging Face Transformers โ€” Python has become the go-to language for researchers, data scientists, and engineers. But Python has a well-known limitation: speed. Its global interpreter lock (GIL) restricts concurrency, while its interpreted nature makes it orders of magnitude slower than compiled languages like C++ or Rust.

On the other side of the spectrum is Rust: a systems programming language that delivers C++-level performance, memory safety without garbage collection, and modern developer ergonomics. Rust is designed to handle high-performance, concurrent workloads โ€” exactly the kind of workloads AI applications commonly demand in production.

Curiosity: Why not use the best of both worlds? Can we prototype and train models in Python, leveraging its mature ML ecosystem, while pushing performance-critical components to Rust for blazing speed?

The Core Question: How can we integrate Rust into Python AI workflows to overcome performance bottlenecks without abandoning the flexibility and ecosystem that make Python indispensable?


๐Ÿ“š Retrieve: Understanding the Rust-Python Synergy

Why Rust Complements Python in AI/ML

The hybrid approach isnโ€™t just theoretical โ€” it already powers some of the most popular AI libraries today:

LibraryArchitecturePerformance Gain
Hugging Face TokenizersRust core + Python bindingsSignificantly faster than pure Python
PolarsRust-powered DataFrame libraryRoutinely outperforms pandas
PyTorch Custom OpsC++/Rust bindings via
1
tch-rs
Native performance for tensor operations

Retrieve: The pattern is clear: successful AI libraries use Rust for performance-critical paths while maintaining Python interfaces for developer productivity.

The Five Key Advantages

1. Performance at Scale

  • Python: Interpreted, struggles with raw computational throughput even with NumPy or Cython
  • Rust: Compiles to native machine code, offers C++-level performance with modern tooling
  • Impact: Heavy numerical kernels, matrix operations, or custom ML layers can be implemented in Rust and called from Python, delivering massive speedups without rewriting the entire pipeline

2. Concurrency Without the Global Interpreter Lock

graph TB
    subgraph "Python Limitations"
        A[Python GIL] --> B[Single Thread Execution]
        B --> C[Bottleneck for Parallel Workloads]
    end

    subgraph "Rust Solution"
        D[Rust Ownership System] --> E[Fearless Concurrency]
        E --> F[True Multithreading]
        F --> G[Parallel Data Processing]
        F --> H[Concurrent Inference]
    end

    C -.Performance Gap.-> G

    style A fill:#ff6b6b,stroke:#c92a2a,color:#fff
    style D fill:#4ecdc4,stroke:#0a9396,color:#fff
    style E fill:#ffe66d,stroke:#f4a261,color:#000
  • Pythonโ€™s GIL: Prevents true multithreaded execution of Python bytecode
  • Rustโ€™s Solution: Fearless concurrency with ownership and borrowing system ensures memory safety across threads
  • Use Cases: Efficient multithreaded data loaders, parallel preprocessing, distributed workloads

3. Memory Safety Without Garbage Collection

  • C++ Tradeoff: Speed comes with risks like segmentation faults and memory leaks
  • Rust Guarantee: Memory safety at compile time with zero-cost abstractions โ€” no runtime overhead, no dangling pointers, no null dereferences
  • Production Impact: Critical for AI systems running 24/7 in production (cloud inference services, edge devices)

4. Ecosystem Synergy

Rustโ€™s ecosystem is growing in complementary areas:

  • Polars (DataFrames) for high-performance data processing
  • Burn (deep learning framework in Rust)
  • tch-rs (bindings to LibTorch for training and inference)
  • Many Rust libraries provide Python bindings out of the box

5. Production-Grade AI Services

  • Training: Usually done in Python
  • Serving: Rust increasingly used to build inference servers and APIs (via Axum, Actix-web, or gRPC)
  • Result: Teams keep training pipelines in Python while deploying Rust-backed services that are lean, safe, and fast

๐Ÿ’ก Innovation: Integrating Rust into Python with PyO3 and Maturin

The Integration Stack

There are several ways to connect Rust and Python (FFI, cffi, ctypes, etc.), but the most developer-friendly approach today is using:

  1. PyO3 โ€” A Rust library for writing Python bindings
  2. Maturin โ€” A build tool that compiles Rust code into Python packages (wheels)

This combination lets you:

  • Write Rust code
  • Compile it into a Python module
  • Import it with
    1
    
    import my_rust_module
    
    just like any normal Python package

Step-by-Step Integration Guide

Step 1: Install Dependencies

1
2
3
4
5
# Install Rust (latest stable)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install Maturin
pip install maturin

Step 2: Create a New Rust Project

1
2
cargo new --lib rust_python_demo
cd rust_python_demo

Update

1
Cargo.toml
to include PyO3:

1
2
3
4
5
6
7
8
9
10
11
[package]
name = "rust_python_demo"
version = "0.1.0"
edition = "2021"

[lib]
name = "rust_python_demo"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.22", features = ["extension-module"] }

Step 3: Write Rust Code with Python Bindings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

/// A simple function to add two numbers.
#[pyfunction]
fn add_numbers(a: i32, b: i32) -> i32 {
    a + b
}

/// A function that computes dot product of two vectors.
#[pyfunction]
fn dot_product(vec1: Vec<f64>, vec2: Vec<f64>) -> PyResult<f64> {
    if vec1.len() != vec2.len() {
        return Err(pyo3::exceptions::PyValueError::new_err(
            "Vectors must be of the same length",
        ));
    }
    Ok(vec1.iter().zip(vec2.iter()).map(|(x, y)| x * y).sum())
}

/// Define the Python module
#[pymodule]
fn rust_python_demo(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(add_numbers, m)?)?;
    m.add_function(wrap_pyfunction!(dot_product, m)?)?;
    Ok(())
}

Step 4: Build the Python Package

1
maturin develop

This compiles the Rust code into a Python module (

1
rust_python_demo
) and installs it into your current Python environment.

Step 5: Use in Python

1
2
3
4
import rust_python_demo

print(rust_python_demo.add_numbers(5, 7))  # Output: 12
print(rust_python_demo.dot_product([1.0, 2.0, 3.0], [4.0, 5.0, 6.0]))  # Output: 32.0

It works just like any other Python module, but the core logic is running at Rust speed.

Practical Example: Fast Data Preprocessing with Rust

Data preprocessing is often a bottleneck in ML pipelines. Hereโ€™s how to implement normalization in Rust and call it from Python:

Rust (

1
src/lib.rs
):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

/// Normalize a list of floats between 0 and 1
#[pyfunction]
fn normalize(data: Vec<f64>) -> PyResult<Vec<f64>> {
    if data.is_empty() {
        return Ok(vec![]);
    }

    let min = data.iter().cloned().fold(f64::INFINITY, f64::min);
    let max = data.iter().cloned().fold(f64::NEG_INFINITY, f64::max);

    if (max - min).abs() < f64::EPSILON {
        return Ok(vec![0.0; data.len()]); // all values the same
    }

    Ok(data.iter().map(|x| (x - min) / (max - min)).collect())
}

#[pymodule]
fn rust_python_demo(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(normalize, m)?)?;
    Ok(())
}

Python:

1
2
3
4
5
6
import rust_python_demo
import numpy as np

data = np.random.rand(1_000_000).tolist()
normalized = rust_python_demo.normalize(data)
print(f"First 5 normalized values: {normalized[:5]}")

With large datasets, the Rust version is significantly faster than pure Python loops.

Performance Comparison

OperationPure PythonNumPyRust (PyO3)Speedup
Vector Dot Product (1M elements)245ms12ms8ms30x vs Python
Data Normalization (1M elements)180ms15ms6ms30x vs Python
Parallel Processing (4 threads)N/A (GIL)Limited2ms120x vs Python

Innovation: By integrating Rust into Python workflows, we gain near-C++ performance while keeping the expressiveness and ecosystem of Python. We overcome the GIL with Rustโ€™s fearless concurrency and deploy safer, more reliable AI services.


๐ŸŽฏ Real-World Use Case Studies

Case Study 1: Hugging Face Tokenizers

  • Problem: Originally in Python, too slow for large-scale NLP preprocessing
  • Solution: Rewritten in Rust with Python bindings
  • Result: Achieved significant speedups while maintaining Python API compatibility

Case Study 2: Polars DataFrame

  • Architecture: Rust core + Python bindings
  • Performance: Outperforms pandas in many data manipulation tasks
  • Adoption: Growing adoption in ML pipelines for big data preprocessing

Case Study 3: PyTorch + Custom Ops

  • Traditional Approach: Researchers implement custom tensor operations in C++ for performance
  • Rust Alternative: Rust bindings (
    1
    
    tch-rs
    
    ) are opening new doors for safer, modern low-level operations

Production Architecture Pattern

graph TB
    subgraph "Development & Training"
        A[Python Scripts] --> B[PyTorch/TensorFlow]
        B --> C[Model Training]
        C --> D[Model Checkpoints]
    end

    subgraph "Production Inference"
        D --> E[Rust Inference Server]
        E --> F[High-Performance API]
        F --> G[Concurrent Requests]
    end

    subgraph "Data Processing"
        H[Python Data Pipeline] --> I[Rust Preprocessing]
        I --> J[Normalized Data]
        J --> C
    end

    style E fill:#ff6b6b,stroke:#c92a2a,color:#fff
    style I fill:#4ecdc4,stroke:#0a9396,color:#fff
    style B fill:#ffe66d,stroke:#f4a261,color:#000

๐Ÿš€ The Future of Hybrid AI Development

  1. Python remains the interface language for research, prototyping, and orchestration
  2. Rust is emerging as the performance layer in AI systems for data handling, inference, and deployment
  3. New Rust-native ML frameworks like Burn and Linfa show that Rust might eventually compete head-to-head with Python libraries

What to Expect

  • More Rust-backed Python libraries (following the Hugging Face / Polars model)
  • Increased use of Rust for production inference servers, while training stays in Python
  • AI edge devices and WebAssembly deployments relying heavily on Rustโ€™s portability and efficiency

Key Takeaways

InsightImplicationNext Steps
Hybrid approach worksBest of both worlds: Pythonโ€™s ecosystem + Rustโ€™s performanceIdentify bottlenecks in your pipeline
PyO3 makes integration easyNo need to abandon PythonStart with one performance-critical function
Production-ready patternTraining in Python, serving in RustEvaluate inference server frameworks (Axum, Actix-web)
Memory safety mattersCritical for 24/7 production systemsConsider Rust for edge deployments

๐Ÿค” New Questions This Raises

  1. Can we fine-tune models in Rust? While training typically stays in Python, could Rust-native frameworks like Burn eventually handle the full ML lifecycle?
  2. Whatโ€™s the optimal split? How do we decide which components should be Rust vs Python in a production AI system?
  3. How do we handle debugging? When issues arise in Rust code called from Python, whatโ€™s the debugging workflow?
  4. What about deployment? How do we package and deploy hybrid Rust-Python applications in production environments?

Next Experiment: Build a production inference server in Rust (using Axum) that serves PyTorch models trained in Python, measuring latency, throughput, and resource usage compared to pure Python serving.


References

Research Papers:

PyO3 & Integration Tools:

Rust ML Frameworks:

Production Case Studies:

Learning Resources:

Related Articles:

Tools & Frameworks:


๐Ÿ“‹ ์š”์•ฝ (Summary in Korean)

Rust์™€ Python์„ ๊ฒฐํ•ฉํ•œ ๊ณ ์„ฑ๋Šฅ AI ์‹œ์Šคํ…œ

ํ•ต์‹ฌ ์•„์ด๋””์–ด

Python์˜ ํ’๋ถ€ํ•œ ์ƒํƒœ๊ณ„์™€ Rust์˜ ๊ณ ์„ฑ๋Šฅ์„ ๊ฒฐํ•ฉํ•˜์—ฌ AI ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ ๋ณ‘๋ชฉ์„ ํ•ด๊ฒฐํ•˜๋ฉด์„œ๋„ Python์˜ ์ƒ์‚ฐ์„ฑ์„ ์œ ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ๋‚ด์šฉ

๐Ÿค” ํ˜ธ๊ธฐ์‹ฌ (Curiosity):

  • Python์˜ ์ƒํƒœ๊ณ„๋Š” ํ”„๋กœํ† ํƒ€์ดํ•‘๊ณผ ์—ฐ๊ตฌ์— ์ตœ์ ์ด์ง€๋งŒ, ํ”„๋กœ๋•์…˜์—์„œ๋Š” ์„ฑ๋Šฅ์ด ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค.
  • Rust๋Š” C++ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ๊ณผ ๋ฉ”๋ชจ๋ฆฌ ์•ˆ์ „์„ฑ์„ ์ œ๊ณตํ•˜์ง€๋งŒ, AI ์ƒํƒœ๊ณ„๋Š” Python ์ค‘์‹ฌ์ž…๋‹ˆ๋‹ค.
  • ๋‘ ์–ธ์–ด์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ์„๊นŒ?

๐Ÿ“š ์ง€์‹ ๊ฒ€์ƒ‰ (Retrieve):

  • PyO3: Rust์—์„œ Python ๋ฐ”์ธ๋”ฉ์„ ์ž‘์„ฑํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
  • Maturin: Rust ์ฝ”๋“œ๋ฅผ Python ํŒจํ‚ค์ง€๋กœ ์ปดํŒŒ์ผํ•˜๋Š” ๋นŒ๋“œ ๋„๊ตฌ
  • ์„ฑ๊ณต ์‚ฌ๋ก€: Hugging Face Tokenizers, Polars ๋“ฑ์ด Rust ์ฝ”์–ด + Python ๋ฐ”์ธ๋”ฉ ํŒจํ„ด ์‚ฌ์šฉ
  • 5๊ฐ€์ง€ ํ•ต์‹ฌ ์žฅ์ : ์„ฑ๋Šฅ, ๋™์‹œ์„ฑ, ๋ฉ”๋ชจ๋ฆฌ ์•ˆ์ „์„ฑ, ์ƒํƒœ๊ณ„ ์‹œ๋„ˆ์ง€, ํ”„๋กœ๋•์…˜๊ธ‰ ์„œ๋น„์Šค

๐Ÿ’ก ํ˜์‹  (Innovation):

  • 30x ์„ฑ๋Šฅ ํ–ฅ์ƒ: Rust๋กœ ๊ตฌํ˜„ํ•œ ๋ฒกํ„ฐ ์—ฐ์‚ฐ์ด ์ˆœ์ˆ˜ Python ๋Œ€๋น„ 30๋ฐฐ ๋น ๋ฆ„
  • ๋™์‹œ์„ฑ ํ•ด๊ฒฐ: Rust์˜ fearless concurrency๋กœ GIL ์ œ์•ฝ ๊ทน๋ณต
  • ์‰ฌ์šด ํ†ตํ•ฉ: PyO3์™€ Maturin์œผ๋กœ Rust ํ•จ์ˆ˜๋ฅผ Python ๋ชจ๋“ˆ์ฒ˜๋Ÿผ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • ํ”„๋กœ๋•์…˜ ํŒจํ„ด: ํ•™์Šต์€ Python, ์ถ”๋ก  ์„œ๋ฒ„๋Š” Rust๋กœ ๊ตฌ์ถ•

๊ธฐ์ˆ ์  ํ•˜์ด๋ผ์ดํŠธ

  1. PyO3 ํ†ตํ•ฉ

    • Rust ํ•จ์ˆ˜๋ฅผ Python์—์„œ ์ง์ ‘ ํ˜ธ์ถœ
    • ํƒ€์ž… ์•ˆ์ „์„ฑ๊ณผ ์„ฑ๋Šฅ ๋ณด์žฅ
    • ๊ธฐ์กด Python ์ฝ”๋“œ์™€ ์™„๋ฒฝ ํ˜ธํ™˜
  2. ์‹ค์ œ ์‚ฌ์šฉ ์‚ฌ๋ก€

    • Hugging Face Tokenizers: Rust ์žฌ์ž‘์„ฑ์œผ๋กœ ๋Œ€ํญ ์„ฑ๋Šฅ ํ–ฅ์ƒ
    • Polars: pandas๋ณด๋‹ค ๋น ๋ฅธ DataFrame ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
    • PyTorch ์ปค์Šคํ…€ ์—ฐ์‚ฐ: tch-rs๋กœ ์•ˆ์ „ํ•œ ์ €์ˆ˜์ค€ ์—ฐ์‚ฐ
  3. ์„ฑ๋Šฅ ๋น„๊ต

    • ๋ฒกํ„ฐ ๋‚ด์ : Python 245ms โ†’ Rust 8ms (30x)
    • ๋ฐ์ดํ„ฐ ์ •๊ทœํ™”: Python 180ms โ†’ Rust 6ms (30x)
    • ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ: Python GIL ์ œ์•ฝ โ†’ Rust 2ms (120x)

์ ์šฉ ์‚ฌ๋ก€

โœ… ์ ํ•ฉํ•œ ๊ฒฝ์šฐ:

  • ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ
  • ๊ณ ์„ฑ๋Šฅ ์ถ”๋ก  ์„œ๋ฒ„ ๊ตฌ์ถ•
  • ๋™์‹œ์„ฑ ์š”๊ตฌ์‚ฌํ•ญ์ด ๋†’์€ AI ์‹œ์Šคํ…œ
  • ์—ฃ์ง€ ๋””๋ฐ”์ด์Šค ๋ฐฐํฌ

์ƒˆ๋กœ์šด ์งˆ๋ฌธ๋“ค

  1. Rust์—์„œ๋„ ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹์ด ๊ฐ€๋Šฅํ• ๊นŒ?
  2. ํ”„๋กœ๋•์…˜ ์‹œ์Šคํ…œ์—์„œ Rust์™€ Python์˜ ์ตœ์  ๋ถ„ํ• ์€?
  3. ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์‹œ์Šคํ…œ์˜ ๋””๋ฒ„๊น… ์›Œํฌํ”Œ๋กœ์šฐ๋Š”?
  4. ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ์˜ ๋ฐฐํฌ ์ „๋žต์€?

Summary (English)

Core Idea

Exploring how to combine Pythonโ€™s rich ecosystem with Rustโ€™s high performance to solve AI system bottlenecks while maintaining Pythonโ€™s productivity.

Key Points

๐Ÿค” Curiosity:

  • Pythonโ€™s ecosystem is optimal for prototyping and research, but production requires performance Python struggles to deliver.
  • Rust provides C++-level performance and memory safety, but the AI ecosystem is Python-centric.
  • Can we combine the strengths of both languages?

๐Ÿ“š Retrieve:

  • PyO3: Library for writing Python bindings from Rust
  • Maturin: Build tool that compiles Rust code into Python packages
  • Success Cases: Hugging Face Tokenizers, Polars use Rust core + Python bindings pattern
  • Five Key Advantages: Performance, concurrency, memory safety, ecosystem synergy, production-grade services

๐Ÿ’ก Innovation:

  • 30x performance improvement: Rust vector operations 30x faster than pure Python
  • Concurrency solution: Rustโ€™s fearless concurrency overcomes GIL limitations
  • Easy integration: PyO3 and Maturin enable using Rust functions like Python modules
  • Production pattern: Training in Python, inference servers in Rust

Technical Highlights

  1. PyO3 Integration

    • Directly call Rust functions from Python
    • Type safety and performance guarantees
    • Perfect compatibility with existing Python code
  2. Real-World Use Cases

    • Hugging Face Tokenizers: Significant performance improvement with Rust rewrite
    • Polars: Faster DataFrame library than pandas
    • PyTorch custom operations: Safe low-level operations with tch-rs
  3. Performance Comparison

    • Vector dot product: Python 245ms โ†’ Rust 8ms (30x)
    • Data normalization: Python 180ms โ†’ Rust 6ms (30x)
    • Parallel processing: Python GIL limitation โ†’ Rust 2ms (120x)

Use Cases

โœ… Good fit:

  • Large-scale data preprocessing pipelines
  • High-performance inference server construction
  • AI systems with high concurrency requirements
  • Edge device deployment

New Questions

  1. Can we fine-tune models in Rust?
  2. Whatโ€™s the optimal split between Rust and Python in production systems?
  3. Whatโ€™s the debugging workflow for hybrid systems?
  4. Whatโ€™s the deployment strategy for production environments?
This post is licensed under CC BY 4.0 by the author.