Post

Feynman: An Open-Source AI Research Agent That Actually Thinks Beyond Search

🤔 Curiosity: Why do most “research agents” still feel like glorified search boxes?

A lot of AI tools now claim to help with research.

They search the web, summarize papers, and generate drafts quickly. That sounds useful, but most of them still break at the exact moment real research work begins.

The problem is not finding one answer. The problem is managing a messy research workflow:

  • searching papers and web sources together
  • comparing claims across sources
  • checking whether a paper’s public code actually matches the write-up
  • running experiments safely
  • keeping citations intact
  • turning notes into something a person can actually reuse

That is why Feynman caught my attention.

According to the official GitHub repository and project website, Feynman is not positioning itself as a generic chat assistant. It is explicitly framed as “the open source AI research agent”, and that difference matters. The product pitch is not “ask me anything.” The pitch is closer to: give me a research workflow, and I will operate inside it.

That is a much more interesting direction.

If you care about practical AI systems, the real question is not whether an agent can summarize a paper. The real question is whether it can support a research loop that includes evidence gathering, review, verification, and even experiment execution.

That is the standard Feynman is trying to meet.


📚 Retrieve: What Feynman actually ships

From the official repository, Feynman currently presents itself as a research-first CLI built on Pi and alphaXiv, with version 0.2.17 released on 2026-04-10.

1) It is built around research workflows, not just one-off prompts

The homepage and README make the workflow orientation very explicit.

Examples from the official docs include:

1
2
3
4
5
feynman "what do we know about scaling laws"
feynman deepresearch "mechanistic interpretability"
feynman lit "RLHF alternatives"
feynman audit 2401.12345
feynman replicate "chain-of-thought improves math"

That is already a stronger product definition than most agent launches.

Instead of pretending all knowledge work is one shape, Feynman separates several distinct jobs:

  • source-heavy deep research
  • literature review
  • simulated peer review
  • paper-versus-code auditing
  • replication planning and execution
  • source comparison
  • paper-style drafting
  • recurring research watch loops

That matters because “research” is not one thing. A literature review and a reproducibility audit are different tasks, and a decent agent should admit that.

2) The agent roles are surprisingly practical

The project bundles four research agents:

  • Researcher
  • Reviewer
  • Writer
  • Verifier

This is one of the most important parts of the design.

A lot of agent tools still act like one model should gather facts, interpret them, write the answer, and validate itself in the same pass. That tends to look efficient in a demo and sloppy in actual use.

Feynman’s split is much healthier:

  • the Researcher gathers evidence
  • the Reviewer critiques quality
  • the Writer structures output
  • the Verifier checks citations and URLs

Even if the implementation evolves, that orchestration philosophy is the right one. Research quality usually improves when evidence collection, writing, and verification are not collapsed into a single monologue.

3) It connects the paper layer to the execution layer

This is where the project becomes more than a “smart search” wrapper.

The official materials describe integrations around:

  • alphaXiv for paper search, Q&A, code reading, and annotation
  • web search for broader external retrieval
  • session search for recalling previous work
  • preview export for browser and PDF artifacts
  • Docker for isolated local experiment execution
  • Modal for burst GPU compute
  • RunPod for longer-lived GPU workloads

That stack reveals the real ambition.

Feynman is trying to connect three worlds that are usually fragmented:

  1. reading papers
  2. verifying claims against code and sources
  3. running experiments when needed

That is the right stack for anyone who thinks research should be operational, not just descriptive.

4) The distribution model is builder-friendly

The 2026-04-10 release, v0.2.17, ships standalone native bundles for macOS, Linux, and Windows, and the installer pulls tagged releases instead of arbitrary main branch state.

The README also highlights a skills-only installation mode.

That detail is more important than it sounds.

A lot of developers do not want another full terminal application with another runtime and another state model. Sometimes they only want the reusable research workflows and prompts. Feynman’s skills-only path acknowledges that reality.

The project also documents a local-model path through a custom provider flow, including Ollama via an OpenAI-compatible endpoint at http://localhost:11434/v1.

That lowers the barrier for people who want more control over cost, privacy, or model routing.

5) The release cadence suggests active packaging and workflow work

Recent commit activity around the 2026-04-10 release includes changes like:

  • switch release workflow to binary only
  • verify rpc and website gates
  • remove stale web access override
  • document local model setup
  • harden installers rendering and dependency hygiene
  • fix startup packaging and content guardrails

That is not flashy benchmark theater. It is infrastructure work.

And frankly, that is a good sign.

A serious agent project is not only about adding new prompts. It is about making setup, packaging, runtime behavior, and operational safety less fragile.


🖼️ Source Images Worth Including

Official hero image

Feynman CLI hero image

GitHub preview image

Feynman GitHub preview

These are enough to ground the post visually without turning it into a screenshot dump.


Concrete example: Where Feynman could be genuinely useful

Imagine you are investigating a new LLM paper that claims better coding reliability.

A shallow workflow would look like this:

  • read the abstract
  • skim a few charts
  • ask a chatbot for a summary
  • tweet a conclusion too early

A Feynman-style workflow is much more interesting:

  1. Run a literature review on the paper’s topic area.
  2. Use deep research to gather competing papers and external commentary.
  3. Audit the paper against the linked repository.
  4. Replicate one key experiment locally or on GPU infrastructure.
  5. Draft a paper-style output with citations.
  6. Re-run verification so the references are still valid.

That is not just “content generation.” It is a lightweight research operations stack.

For solo builders, independent researchers, or technical writers, that is arguably the right abstraction. The real bottleneck is rarely typing fast. It is preserving research discipline across multiple steps without dropping provenance.


A simplified mental model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def run_research_question(topic):
    evidence = researcher.collect(
        papers=True,
        web=True,
        repos=True,
        docs=True,
        query=topic,
    )

    review = reviewer.critique(evidence)
    draft = writer.compose(evidence, review)
    verified = verifier.check_links_and_citations(draft)

    return verified

That is not the literal implementation. It is the product idea in compact form.

And it explains why Feynman feels more promising than a generic “AI for research” label.


The most important thing about Feynman is not that it can search papers. A lot of tools can do that.

The important thing is that it treats research as an operational workflow with distinct stages:

  • retrieve
  • compare
  • review
  • verify
  • replicate
  • publish

That is a stronger product philosophy than most current “research agents.”

I think this matters for three reasons.

1) It pushes agents toward accountable outputs

The project repeatedly emphasizes cited, source-grounded output and URL verification.

That sounds basic, but it is still one of the biggest weaknesses in AI-assisted research. Many systems can sound informed while quietly decoupling their output from real evidence.

Any agent that wants to be useful in research has to make provenance first-class. Feynman at least appears to understand that.

2) It treats reproducibility as part of the interface

The /audit and /replicate ideas are especially strong.

Too many AI research tools stop at summarization. But the gap between a paper’s claim and the behavior of its public code is often where the real story lives.

If Feynman keeps improving its audit and replication loops, it could become useful not only for researchers, but also for:

  • technical due diligence
  • model vendor evaluation
  • internal R&D reviews
  • ML engineering retrospectives
  • high-signal technical blogging

That is much more valuable than another generic “AI researcher” chatbot.

3) It suggests a future where research agents become infrastructure

The long-term opportunity is not one magical agent that knows everything. The opportunity is a dependable workflow layer for evidence-driven work.

In that future, a research agent is less like a search engine and more like a compact operating system for technical investigation.

Feynman is not fully there yet, of course. The project is still early, and early agent products often look cleaner in the README than in repeated real-world use.

But the direction is solid.

If the team can keep improving:

  • source reliability
  • experiment reproducibility
  • model and provider flexibility
  • artifact export
  • failure recovery in long research loops

then Feynman could become one of the more interesting open-source projects in the research-agent category.


Final Thoughts

My main takeaway is simple.

Feynman is interesting not because it promises “AI research,” but because it tries to formalize the parts that usually get hand-waved away:

  • evidence gathering
  • role separation
  • citation verification
  • code and paper comparison
  • experiment execution

That makes it feel less like a search demo and more like the beginning of a research workflow system.

And that is the right direction.

The next generation of useful AI research tools will probably not win by sounding smarter in chat. They will win by making evidence-heavy work easier to run, easier to verify, and easier to repeat.

Feynman looks much closer to that future than most.

Sources

  • GitHub repository: https://github.com/getcompanion-ai/feynman
  • Project site: https://feynman.is/
  • README: https://raw.githubusercontent.com/getcompanion-ai/feynman/main/README.md
  • Package metadata: https://raw.githubusercontent.com/getcompanion-ai/feynman/main/package.json
  • Releases: https://github.com/getcompanion-ai/feynman/releases
This post is licensed under CC BY 4.0 by the author.