> For the complete documentation index, see [llms.txt](https://neurosymbolicai.gitbook.io/neuro-symbolic-ai-in-practice/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://neurosymbolicai.gitbook.io/neuro-symbolic-ai-in-practice/part-i-motivation/chapter-1/1-3-llm-modulo.md).

# 1.3 The LLM-Modulo Framework

The 2024 ICML position paper by Kambhampati, Valmeekam, Guan, Stechly, and colleagues (2024), "LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks," sharpens the diagnostic and proposes a constructive alternative. It is required reading for any practitioner building planning-capable AI systems.

## The Core Diagnosis

The paper argues, with supporting empirical evidence, that LLMs cannot serve as *autonomous* planners for the following fundamental reason: **planning requires verifying that a proposed action sequence achieves the goal from the initial state — and LLMs have no reliable mechanism for this verification.**

More precisely:

* LLMs trained on planning-related text learn to *generate* sequences that pattern-match to plans they have seen.
* They cannot *simulate* state transitions in a verifiably correct manner.
* They cannot *detect* when their own output violates a precondition or misses the goal.
* When asked to self-evaluate and correct, they often produce different-but-equally-wrong outputs.(Stechly et al., 2023)(Kambhampati et al., 2024)

To ground these claims empirically, the paper uses **PlanBench** (Valmeekam et al., 2023) — a benchmark specifically designed to measure LLM planning capability across problem sizes and domains. On PlanBench, even state-of-the-art LLMs with chain-of-thought prompting achieve poor performance on problems with more than a handful of steps.(Valmeekam et al., 2023)

## The LLM-Modulo Framework

The constructive contribution is the **LLM-Modulo framework**, which reframes the role of LLMs from autonomous planners to *collaborative contributors* within a verified planning loop.(Kambhampati et al., 2024)

The framework operates as follows:

```
Human (goal/feedback)
         │
         ▼
┌─────────────────────────────────────┐
│        LLM-Modulo System            │
│                                     │
│  ┌──────────┐     ┌──────────────┐  │
│  │   LLM    │────▶│   Critics    │  │
│  │ (Generate│◀────│  (Verify)    │  │
│  │  Ideas)  │     │              │  │
│  └──────────┘     └──────────────┘  │
│       ▲                 │           │
│       └─────────────────┘           │
│         Iterate until valid         │
└─────────────────────────────────────┘
         │
         ▼
   Verified Plan
```

The **LLM** generates candidate plans, translates natural language into structured representations, identifies sub-goals, proposes heuristics, and explains plan steps. It does what it does well: leveraging vast compressed knowledge about the world expressed in language.

The **Critics** are the verification layer — they may include:

* **Symbolic planners** — optimal planners (Fast Downward in optimal mode) that verify both correctness and plan quality; satisficing planners (LAMA, Fast Downward with greedy search) that verify correctness without optimality guarantees.
* **Domain-specific verifiers** (physics simulators, safety checkers, formal model checkers).
* **Human experts** who evaluate semantic appropriateness.
* **Other AI systems** (e.g., a separate LLM acting as a critic).

A practical architectural refinement, demonstrated in the HAIMEDA production system (Sigloch & Benzmüller, 2026), is **asymmetric pre/post-generation verification**: formal methods with decidable completeness verify that *input prompts* conform to structured type constraints *before* LLM generation; neural embedding-based similarity validates *generated outputs* for semantic faithfulness *after* generation. This asymmetry is important: formal input verification eliminates structurally inconsistent prompts before hallucination opportunities arise, while neural output verification catches semantic fabrications that formal methods cannot express. Applied to medical device damage assessment, this pipeline achieved 83% detection of structured-entity hallucinations and 72% of semantic fabrications — with 30% reduction in report creation time.(Sigloch & Benzmüller, 2026)

The loop iterates until the plan satisfies all critics or exhausts a budget. This is not merely a pipeline — it is an architectural pattern in which LLMs contribute *approximate knowledge* and symbolic systems enforce *formal correctness*.

## LLM Roles in LLM-Modulo

Kambhampati et al. enumerate the roles LLMs can reliably play within the modulo framework:(Kambhampati et al., 2024)

| Role                         | What the LLM Does                                       | Who Verifies                                                                                                                    |
| ---------------------------- | ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| **Plan candidate generator** | Proposes action sequences from natural language         | Symbolic planner / domain verifier                                                                                              |
| **Goal reformulator**        | Translates ambiguous goals to formal specs              | Human / formal checker                                                                                                          |
| **Sub-goal identifier**      | Decomposes complex goals into achievable steps          | Planner / logical validator                                                                                                     |
| **Heuristic generator**      | Suggests domain-specific search heuristics              | Planner performance (empirical)                                                                                                 |
| **Plan explainer**           | Produces natural language rationale for a verified plan | Human                                                                                                                           |
| **NL → PDDL translator**     | Converts natural language problem descriptions to PDDL  | Planner (syntactic correctness only — semantic faithfulness to the intended problem requires human or domain-expert validation) |

This taxonomy is directly actionable for system architects. The key insight: *stop asking LLMs to plan autonomously; start asking them to contribute their genuine strengths to a verified pipeline.*

*Reference:* Sigloch, Paul, and Christoph Benzmüller. "Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains." *Proceedings of KI 2026 (German Conference on Artificial Intelligence)*, 2026. arXiv:2605.26942. <https://arxiv.org/abs/2605.26942>

> **Next:** [§1.4 — Neural vs. Symbolic: A Fair Comparison](/neuro-symbolic-ai-in-practice/part-i-motivation/chapter-1/1-4-neural-vs-symbolic.md) provides the precise accounting of genuine strengths and weaknesses that forms the basis for principled combination.

***


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://neurosymbolicai.gitbook.io/neuro-symbolic-ai-in-practice/part-i-motivation/chapter-1/1-3-llm-modulo.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.