> For the complete documentation index, see [llms.txt](https://neurosymbolicai.gitbook.io/neuro-symbolic-ai-in-practice/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://neurosymbolicai.gitbook.io/neuro-symbolic-ai-in-practice/part-ii-background/chapter-3/3-3-problem-classes.md).

# 3.3 Planning Problem Classes

Real-world planning problems rarely fit the clean classical model of Section 3.1. The following variations capture the practical complexity of domains that practitioners actually encounter.

## Oversubscription Planning (OSP)

Classical planning assumes all goals in $G$ *must* be achieved. **Oversubscription Planning (OSP)** relaxes this: goals have associated utilities and the planner must maximize total utility subject to a plan cost bound $B$ (Smith, 2004; Van den Briel et al., 2004; Mirkis & Domshlak, 2014).

Formally, in OSP each goal $g \in G$ has a utility value $u(g) \geq 0$. The planner must respect a budget constraint $\text{cost}(\pi) \leq B$, and the objective is: $\max \sum\_{g \in G\_\pi} u(g)$, where $G\_\pi$ is the set of goals achieved by plan $\pi$.

OSP is particularly relevant in resource-constrained domains: space mission scheduling (many science goals compete for limited power and data bandwidth), emergency response (limited personnel must be allocated to maximize lives saved), and e-commerce logistics (a delivery fleet can serve a subset of customers within daily route cost limits).

**Complexity:** NP-hard in general (equivalent to a weighted MAX-SAT or knapsack-style problem). Approximation algorithms exist; MCTS and greedy approaches are common in practice (Smith, 2004; Van den Briel et al., 2004).

> \[!TIP] **When to use OSP:** Choose OSP when your domain has more goals than resources can satisfy and you must maximize value delivered — space mission planning, logistics with budget caps, emergency response triage. If all goals must be achieved, use classical planning (§3.1) instead.

*References:*\
Smith, David E. "Choosing Objectives in Over-Subscription Planning." *Proceedings of ICAPS*, 2004, pp. 393–401. Van den Briel, Menkes, et al. "Effective Approaches for Partial Satisfaction Planning." *Proceedings of AAAI*, 2004, pp. 562–569. Mirkis, Vitaly, and Carmel Domshlak. "Landmarks in Oversubscription Planning." *Proceedings of ECAI*, 2014. <https://doi.org/10.3233/978-1-61499-419-0-621>

The OSP model handles incomplete goal satisfaction under resource constraints. The next two models handle something qualitatively different: uncertainty about the world itself.

## Conformant Planning

**Problem:** Planning with **no observability** and **non-deterministic effects**.

In conformant planning, the agent does not know the exact initial state (it is given as a set of possible initial states) and cannot observe the world during execution. A plan must succeed *for all possible initial states* (Goldman & Boddy, 1996).

The plan is a fixed action sequence (not a policy) that achieves the goal regardless of which initial state obtains and regardless of which effects occur when actions are non-deterministic. This is the hardest basic planning setting: the agent must be correct in all worlds simultaneously.

**Complexity:** Σ₂ᴾ-complete for conformant planning with non-deterministic actions and unknown initial state (Rintanen, 1999). This places conformant planning above NP and co-NP in the polynomial hierarchy — harder than classical plan existence when plan length is bounded, but the exact relationship to PSPACE-complete classical planning depends on the specific model variant.

*References:* Goldman, Robert P., and Mark S. Boddy. "Expressive Planning and Explicit Knowledge." *Proceedings of AIPS*, 1996, pp. 110–117. Rintanen, Jussi. "Constructing Conditional Plans by a Theorem-Prover." *Journal of Artificial Intelligence Research* 10 (1999): 323–352.

> \[!WARNING] **Limitation:** Conformant plans tend to be conservative — the agent must succeed in all worlds, so it takes safe actions that work everywhere rather than optimal actions for any particular world. This pessimism leads to long, expensive plans. When observations are available at all, contingent planning is typically a better fit.

When at least some sensing is possible, the conformant model is unnecessarily pessimistic. Contingent planning extends it by allowing the agent to observe and branch.

## Contingent Planning

**Problem:** Planning with **partial observability** and **sensing actions**.

Contingent planning extends conformant planning by allowing **conditional branches** based on observations. The plan is not a sequence but a **policy** — a function from observation histories to actions. At decision points, the agent performs a sensing action, observes the result, and selects the next action accordingly (Peot & Smith, 1992).

```
if observe(light_on):
    continue
else:
    flip_switch()
```

This is the structure of most real robotic systems: sense → decide → act → repeat. Contingent planning formalizes this loop with formal correctness guarantees.

**Complexity:** PSPACE-complete for contingent planning with full observability and branching policies over non-deterministic actions (Rintanen, 2004). This matches classical plan existence — the policy tree may be exponentially large but can be evaluated in polynomial space.

*References:* Peot, Mark A., and David E. Smith. "Conditional Nonlinear Planning." *Proceedings of AIPS*, 1992, pp. 189–197. Rintanen, Jussi. "Complexity of Planning with Partial Observability." *Proceedings of ICAPS*, 2004.

> \[!TIP] **When to use contingent planning:** Use it when actions may have uncertain outcomes and the agent can sense the world between steps — the classic sense–plan–act loop. Most robotic planning architectures are contingent in nature. If outcomes are probabilistic rather than merely non-deterministic, an MDP or POMDP model provides richer structure.

Conformant and contingent planning treat uncertainty as a worst-case problem: the plan must work despite adverse outcomes. The MDP and POMDP models instead represent uncertainty probabilistically and optimize expected performance.

## MDP Planning — Stochastic, Fully Observable

**Model:** **Markov Decision Process (MDP)**.

$$\text{MDP} = \langle S, A, T, R, \gamma \rangle$$

Where $S$ is a state space, $A$ is an action set, $T: S \times A \times S \to \[0,1]$ is a stochastic transition function, $R: S \times A \to \mathbb{R}$ is a reward function, and $\gamma \in \[0,1)$ is a discount factor (Puterman, 1994).

The agent observes the full state at each step (full observability), but actions have stochastic outcomes. The objective is to find a **policy** $\pi: S \to A$ that maximizes expected cumulative discounted reward $\mathbb{E}\[\sum\_{t=0}^\infty \gamma^t R(s\_t, a\_t)]$.

**Complexity:** Polynomial for finite MDPs — value iteration converges in time polynomial in $|S|^2 \cdot |A|$ per iteration, and policy iteration terminates in a polynomial number of iterations. MDP solving is P-complete (Littman et al., 1995). This tractability contrasts sharply with the PSPACE-hardness of POMDP and the NP-hardness of OSP.

**Solution methods:** Value iteration, policy iteration, Q-learning (when $T$ is unknown). In neuro-symbolic settings, deep reinforcement learning (neural) is combined with formal reward shaping or safety constraints (symbolic) to produce policies that are both high-performing and verifiably safe.

*Reference:* Puterman, Martin L. *Markov Decision Processes: Discrete Stochastic Dynamic Programming*. Wiley, 1994. <https://doi.org/10.1002/9780470316887>

> \[!TIP] **When to use MDPs:** Use an MDP when actions have stochastic outcomes (known probabilities), the agent can observe the full state after each action, and the goal is to maximize long-term expected reward. MDPs are the right model for inventory management, robot navigation with stochastic motion, and game AI with random events.

MDPs assume full state observability. When the agent receives only partial or noisy observations of the world — the norm in physical systems — the POMDP model is required.

## POMDP Planning — Stochastic, Partially Observable

**Model:** **Partially Observable Markov Decision Process (POMDP)**.

$$\text{POMDP} = \langle S, A, T, R, \Omega, O, \gamma \rangle$$

Where $\Omega$ is an observation space and $O: S \times A \times \Omega \to \[0,1]$ is an observation function. The agent does not observe $s$ directly; it receives an observation $o \in \Omega$ distributed according to $O$ (Kaelbling et al., 1998).

The agent maintains a **belief state** $b: S \to \[0,1]$ — a probability distribution over current states — and updates it using Bayes' rule after each action and observation. Planning in POMDP is over the belief-state space, which is a continuous space even when $S$ is finite.

**Complexity:** PSPACE-hard for finite-horizon POMDP (Papadimitriou & Tsitsiklis, 1987); undecidable in the infinite-horizon case with unbounded state spaces. In practice, **point-based solvers** (PBVI, SARSOP, PERSEUS) make finite-horizon POMDP planning tractable by approximating the value function over a finite set of belief points.

**Connection to neuro-symbolic AI:** POMDPs with large, continuous observation spaces (images, sensor arrays) are intractable for pure symbolic approaches. Neural perception components process raw observations into compact belief representations; symbolic POMDP solvers then plan over those representations. This is a canonical neuro-symbolic decomposition.

> \[!TIP] **When to use POMDPs:** Use a POMDP when the agent must act under partial observability — it cannot see the full system state — and action outcomes are stochastic. POMDPs are the appropriate model for medical treatment decisions (the patient's true state is partially hidden), dialogue systems (user intent is latent), and robotic manipulation with noisy sensors.

> \[!WARNING] **Limitation:** Exact POMDP solving is PSPACE-hard for finite horizons (Papadimitriou & Tsitsiklis, 1987) and undecidable in the infinite-horizon case (Madani et al., 2003). For large state or observation spaces — such as pixel-level robot observations — pure symbolic POMDP solvers are intractable. Neuro-symbolic approaches that replace the symbolic belief state with a neural latent representation are the current frontier (see §4.3).

*Reference:* Kaelbling, Leslie Pack, Michael L. Littman, and Anthony R. Cassandra. "Planning and Acting in Partially Observable Stochastic Domains." *Artificial Intelligence* 101.1–2 (1998): 99–134. <https://doi.org/10.1016/S0004-3702(98)00023-X>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://neurosymbolicai.gitbook.io/neuro-symbolic-ai-in-practice/part-ii-background/chapter-3/3-3-problem-classes.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
