Post

Model-Native Agentic AI: The Paradigm Shift from Pipelines to Learned Intelligence

🤔 Curiosity: What If AI Agents Could Learn Intelligence Instead of Being Programmed?

What if instead of orchestrating planning, tool use, and memory through external logic and pipelines, these capabilities could be internalized within the model’s parameters? What if agents could learn to reason, act, and remember through experience rather than following pre-scripted rules?

Model-Native Agentic AI Logo

Curiosity: We’ve built AI agents by connecting LLMs to external modules for planning, tool use, and memory. But what if the model itself could learn these capabilities end-to-end? How would that change the nature of agentic AI?

The rapid evolution of agentic AI marks a new phase in artificial intelligence, where Large Language Models (LLMs) no longer merely respond but act, reason, and adapt. A recent comprehensive survey traces a fundamental paradigm shift: from Pipeline-based systems, where planning, tool use, and memory are orchestrated by external logic, to the emerging Model-native paradigm, where these capabilities are internalized within the model’s parameters.

The question: How is this paradigm shift happening? What enables models to learn agentic capabilities rather than having them programmed? And what does this mean for the future of AI agents?

As someone who’s built both pipeline-based and learning-based AI systems, this shift represents something profound: we’re moving from constructing systems that apply intelligence to developing models that grow intelligence through experience.


📚 Retrieve: Understanding the Paradigm Shift

Pipeline-based vs. Model-native: The Fundamental Difference

Pipeline-based Paradigm:

In traditional agentic AI systems, capabilities are externally orchestrated:

graph TB
    subgraph Pipeline["Pipeline-based Agent"]
        I[Input] --> LLM[LLM]
        LLM -->|Generate Plan| P[External Planner Module]
        P -->|Select Tools| T[Tool Selection Module]
        T -->|Execute| E[Tool Execution]
        E -->|Store Results| M[External Memory Store]
        M -->|Retrieve| LLM
        LLM --> O[Output]
    end
    
    style P fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style T fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff
    style M fill:#ff6b6b,stroke:#c92a2a,stroke-width:2px,color:#fff

Characteristics:

  • Planning logic is scripted (e.g., PDDL, MCTS, Tree of Thoughts)
  • Tool use is explicitly called via function calling APIs
  • Memory is external (vector databases, key-value stores)
  • Components are modular and interpretable
  • Requires engineering to connect components

Model-native Paradigm:

In model-native systems, capabilities are learned end-to-end:

graph TB
    subgraph ModelNative["Model-native Agent"]
        I2[Input] --> LLM2[LLM + RL Training]
        LLM2 -->|Learned Planning| P2[Internal Planning]
        P2 -->|Learned Tool Use| T2[Internal Tool Selection]
        T2 -->|Learned Memory| M2[Internal Memory]
        M2 --> LLM2
        LLM2 --> O2[Output]
    end
    
    style LLM2 fill:#4ecdc4,stroke:#0a9396,stroke-width:3px,color:#fff
    style P2 fill:#ffe66d,stroke:#f4a261,stroke-width:2px,color:#000
    style T2 fill:#ffe66d,stroke:#f4a261,stroke-width:2px,color:#000
    style M2 fill:#ffe66d,stroke:#f4a261,stroke-width:2px,color:#000

Characteristics:

  • Planning is learned through reinforcement learning
  • Tool use is internalized within model parameters
  • Memory is encoded in model weights and activations
  • Capabilities emerge from training, not engineering
  • More unified but less interpretable

The Role of Reinforcement Learning

RL as the Algorithmic Engine:

The survey positions Reinforcement Learning (RL) as the key enabler of this paradigm shift. By reframing learning from imitating static data to outcome-driven exploration, RL underpins a unified solution of LLM + RL + Task across language, vision, and embodied domains.

Why RL Matters:

AspectSupervised LearningReinforcement Learning
Learning SignalStatic examplesOutcome-driven feedback
ExplorationLimited to training dataCan explore new strategies
AdaptationFixed behaviorLearns from trial and error
CapabilityPattern matchingStrategic decision-making

The RL Framework for Agentic AI:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Conceptual RL framework for agentic AI
class AgenticRL:
    """
    Curiosity: How does RL enable model-native capabilities?
    Retrieve: RL reframes learning from imitation to outcome-driven exploration
    Innovation: Unified LLM + RL + Task framework for agentic AI
    """
    
    def __init__(self, llm, task_env):
        self.llm = llm
        self.task_env = task_env
        self.reward_model = RewardModel()
    
    def train(self, episodes=1000):
        """
        Train agent through RL to learn planning, tool use, memory
        """
        for episode in range(episodes):
            # Agent generates actions (planning, tool use, memory access)
            trajectory = self.generate_trajectory()
            
            # Evaluate outcome
            reward = self.reward_model.evaluate(trajectory)
            
            # Update model based on outcome
            self.llm.update_with_reward(trajectory, reward)
    
    def generate_trajectory(self):
        """
        Agent generates sequence of actions
        - Planning: Internal reasoning steps
        - Tool use: Learned tool selection
        - Memory: Learned memory access
        """
        state = self.task_env.reset()
        trajectory = []
        
        while not self.task_env.done():
            # Model generates action (learned, not scripted)
            action = self.llm.act(state)
            trajectory.append(action)
            
            # Execute in environment
            state, reward, done = self.task_env.step(action)
        
        return trajectory

đź’ˇ Innovation: Evolution of Core Capabilities

1. Planning: From Scripted to Learned

Pipeline-based Planning:

Early approaches used external planning modules:

  • STRIPS (1971): Classical planning with preconditions and effects
  • PDDL (1998): Planning Domain Definition Language
  • LLM+P (2023): LLMs generate PDDL, external planner solves
  • Tree of Thoughts (2023): External tree search over LLM outputs
  • MCTS + LLM (2024): Monte Carlo Tree Search orchestrates planning

Example: Tree of Thoughts (Pipeline-based)

1
2
3
4
5
6
7
8
9
10
11
12
13
# Pipeline-based: External tree search
class TreeOfThoughts:
    def solve(self, problem):
        # LLM generates candidate thoughts
        thoughts = self.llm.generate_thoughts(problem)
        
        # External module evaluates and expands
        tree = self.build_tree(thoughts)
        
        # External search algorithm finds best path
        solution = self.search(tree)
        
        return solution

Model-native Planning:

Recent approaches learn planning through RL:

  • OpenAI o1 (2024): Learned reasoning through process supervision
  • DeepSeek R1 (2025): RL-based reasoning capability
  • ReST-MCTS* (2024): Self-training via process reward
  • QwQ-32B (2025): Reinforcement learning for reasoning

Example: o1-style Model-native Planning

1
2
3
4
5
6
7
8
9
10
11
# Model-native: Learned planning
class ModelNativePlanner:
    def __init__(self, model):
        # Model has learned planning internally
        self.model = model  # Trained with RL on planning tasks
    
    def solve(self, problem):
        # Model internally reasons/plans
        # No external planner needed
        solution = self.model.reason(problem)
        return solution

Key Papers in Model-native Planning:

ModelKey InnovationRL Approach
OpenAI o1Process supervision for reasoningProcess reward model
DeepSeek R1Large-scale RL for reasoningProcess + outcome rewards
QwQ-32BOpen-source reasoning modelReinforcement learning
L1Controlling reasoning lengthRL with length control
DeepScaleRScaling RL to 1.5B modelEfficient RL training

2. Tool Use: From Function Calling to Learned Integration

Pipeline-based Tool Use:

Traditional approaches use explicit function calling:

  • ReAct (2022): LLM generates “Action: tool_name(args)”
  • AutoGen (2023): Multi-agent conversation with tool calls
  • SWE-agent (2024): Agent-computer interfaces for software engineering
  • Tool calling APIs: OpenAI Functions, Anthropic Tools

Example: Pipeline-based Tool Use

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Pipeline-based: Explicit tool calling
class PipelineToolAgent:
    def act(self, task):
        # LLM generates tool call as text
        response = self.llm.generate(f"Task: {task}")
        
        # External parser extracts tool call
        tool_call = self.parse_tool_call(response)
        # e.g., "Action: search(query='python tutorial')"
        
        # External executor runs tool
        result = self.tool_executor.execute(tool_call)
        
        # LLM processes result
        return self.llm.generate(f"Result: {result}")

Model-native Tool Use:

Recent approaches learn tool use through RL:

  • ReTool (2025): Reinforcement learning for strategic tool use
  • ToolRL (2025): RL for tool-use in reasoning models
  • ToRL (2025): Tool-integrated reinforcement learning
  • R1-Searcher (2025): RL for search capability

Example: Model-native Tool Use

1
2
3
4
5
6
7
8
9
10
11
# Model-native: Learned tool use
class ModelNativeToolAgent:
    def __init__(self, model):
        # Model learned when/how to use tools
        self.model = model  # Trained with RL on tool-using tasks
    
    def act(self, task):
        # Model internally decides tool use
        # No explicit function calling needed
        result = self.model.act_with_tools(task)
        return result

Key Papers in Model-native Tool Use:

ModelKey InnovationApplication
ReToolStrategic tool use via RLGeneral tool use
R1-SearcherSearch capability via RLInformation retrieval
DeepResearcherDeep research via RLLong-horizon research
Tool-N1General tool-use trainingMulti-tool scenarios
Agent LightningUniversal agent trainingAny agent task

3. Memory: From External Stores to Internal Representations

Pipeline-based Memory:

Traditional approaches use external memory systems:

  • Vector Databases: Pinecone, Weaviate, Chroma
  • Key-Value Stores: Redis, Memcached
  • RAG Systems: Retrieval-augmented generation
  • MemGPT (2023): External memory management system

Example: Pipeline-based Memory

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Pipeline-based: External memory
class PipelineMemoryAgent:
    def __init__(self):
        self.llm = LLM()
        self.memory_db = VectorDB()  # External
    
    def remember(self, information):
        # Store in external database
        self.memory_db.store(information)
    
    def recall(self, query):
        # Retrieve from external database
        relevant = self.memory_db.search(query)
        
        # LLM uses retrieved context
        return self.llm.generate(context=relevant)

Model-native Memory:

Recent approaches learn memory within the model:

  • Long Context Models: Qwen2.5-1M, UltraLLaDA (128K context)
  • FlashAttention: Efficient attention for long sequences
  • A-Mem (2025): Agentic memory for LLM agents
  • Memory-R1 (2025): RL for memory management

Example: Model-native Memory

1
2
3
4
5
6
7
8
9
10
11
12
13
# Model-native: Learned memory
class ModelNativeMemoryAgent:
    def __init__(self, model):
        # Model has learned memory internally
        self.model = model  # Trained with long context + RL
    
    def remember(self, information):
        # Memory encoded in model activations
        self.model.update_memory(information)
    
    def recall(self, query):
        # Model internally retrieves from learned memory
        return self.model.recall_and_generate(query)

Key Papers in Model-native Memory:

ModelKey InnovationContext Length
Qwen2.5-1M1M token context1,000,000 tokens
UltraLLaDA128K context128,000 tokens
FlashAttentionEfficient attentionUnlimited (theoretically)
A-MemAgentic memoryLearned memory access
Memory-R1RL for memoryAdaptive memory use

🎯 Applications: Deep Research and GUI Agents

Deep Research Agent

The Challenge:

Deep research requires long-horizon reasoning across multiple steps:

  1. Formulate research questions
  2. Search for information
  3. Synthesize findings
  4. Generate comprehensive reports

Pipeline-based Approach:

  • Query rewriting: External modules rewrite queries
  • RAG systems: External retrieval augments generation
  • Multi-step pipelines: Orchestrate search → synthesis → writing

Model-native Approach:

  • DeepResearcher (2025): RL-trained for deep research
  • WebThinker (2025): Deep research capability via RL
  • R1-Searcher (2025): Learned search capability

Key Innovation:

Models learn to strategically search and synthesize information through RL, rather than following scripted research workflows.

GUI Agent

The Challenge:

GUI agents need embodied interaction with visual interfaces:

  1. Understand screen content
  2. Plan actions
  3. Execute interactions
  4. Adapt to feedback

Pipeline-based Approach:

  • Vision-language models + action executors
  • External planning for action sequences
  • Template-based interaction patterns

Model-native Approach:

  • VTool-R1 (2025): RL for multimodal tool use
  • DeepEyes (2025): “Thinking with images” via RL
  • WebSynthesis (2025): World-model-guided MCTS

Key Innovation:

Models learn to reason about visual interfaces and plan interactions through RL, enabling more adaptive GUI agents.


đź”® Future Directions: Emerging Model-native Capabilities

Multi-agent Collaboration

Current State (Pipeline-based):

  • AutoGen (2023): Orchestrates multiple agents via conversation
  • CrewAI: Framework for multi-agent systems
  • External coordination logic

Emerging (Model-native):

  • G-MEM (2024): Hierarchical memory for multi-agent systems
  • Intrinsic Memory Agents (2025): Structured contextual memory
  • RCR-Router (2025): Role-aware context routing

Vision:

Models learn to collaborate and coordinate with other agents through RL, rather than following scripted protocols.

Reflection

Current State (Pipeline-based):

  • Reflexion (2023): External reflection module
  • CRITIC (2023): Self-correction with tool-interactive critiquing
  • Scripted reflection loops

Emerging (Model-native):

  • Self-RAG (2023): Learned retrieval, generation, critique
  • RAGEN (2025): Self-evolution via multi-turn RL
  • Memory-R1 (2025): RL for memory and reflection

Vision:

Models learn to reflect on their own outputs and self-improve through RL, enabling continuous learning.


📊 Key Takeaways

InsightImplicationEvidence
RL enables paradigm shiftOutcome-driven learning > imitation learningo1, R1, QwQ models
Capabilities can be internalizedLess external orchestration neededModel-native planning, tool use, memory
Unified framework emergesLLM + RL + Task works across domainsLanguage, vision, embodied
Performance improvesLearned > scripted for complex tasksDeep research, GUI agents
Interpretability decreasesTrade-off for learned capabilitiesLess modular, more black-box

Why This Matters

As someone who’s built both pipeline-based and learning-based systems, here’s what excites me:

  1. Less Engineering, More Learning: Instead of orchestrating components, we train models to learn capabilities
  2. Better Performance: Learned capabilities often outperform scripted ones for complex tasks
  3. Unified Framework: LLM + RL + Task works across language, vision, and embodied domains
  4. Continuous Improvement: Models can improve through experience, not just retraining
  5. Emergent Capabilities: New behaviors emerge from training that weren’t explicitly programmed

The Trade-offs:

  • Interpretability: Less modular, harder to debug
  • Control: Less explicit control over agent behavior
  • Training Cost: RL training is expensive
  • Data Requirements: Need task environments for RL

🤔 New Questions This Raises

  1. How do we balance learned vs. scripted? When should capabilities be learned vs. explicitly programmed?

  2. What’s the role of external tools? As models learn tool use, do we still need external tool APIs?

  3. How do we ensure safety? Learned behaviors are harder to verify and control—how do we ensure agents behave safely?

  4. What about interpretability? How do we understand and debug model-native agents?

  5. Scaling RL training: How do we make RL training more efficient and scalable?

  6. Hybrid approaches: Can we combine pipeline-based and model-native approaches for best of both worlds?

Next Steps: Explore specific model-native implementations, understand RL training methodologies, and experiment with building model-native agents.


References

Survey Paper:

Model-native Planning:

Model-native Tool Use:

Model-native Memory:

Reinforcement Learning for LLMs:

Pipeline-based Approaches (For Comparison):

Applications:

Future Directions:

Implementation Resources:

Related Surveys:

This post is licensed under CC BY 4.0 by the author.