> For the complete documentation index, see [llms.txt](https://neurosymbolicai.gitbook.io/neuro-symbolic-ai-in-practice/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://neurosymbolicai.gitbook.io/neuro-symbolic-ai-in-practice/part-iii-core-approaches/chapter-4/4-2-neural-helps-symbolic/4-2-neural-subroutines.md).

# 4.2.2 Neural Subroutines in Symbolic Systems

**Definition:** Using neural networks to rapidly evaluate states, estimate heuristic values, or generate candidate moves within an otherwise classical symbolic framework — particularly within tree search and planning algorithms (Garcez & Lamb, 2023).

This is the pattern that produced AlphaGo, AlphaProof, and a new generation of neuro-symbolic planning systems. The symbolic framework (Monte Carlo Tree Search, A\* search, SAT solver) provides structure, completeness, and correctness guarantees. The neural component provides *guidance* — dramatically reducing the search space that the symbolic system must explore.

### Neural Heuristics for Planning

In classical planning, a **heuristic function** $h(s)$ estimates the distance from state $s$ to the goal. Good heuristics dramatically reduce the number of states explored by A\* and greedy best-first search. Computing good heuristics analytically (e.g., LM-cut, FF relaxation) is itself computationally expensive.

**Neural heuristics** train a neural network to predict heuristic values directly from state features:

1. A symbolic planner generates training data: (state, optimal distance to goal) pairs.
2. A neural network (typically a GNN or MLP operating on the PDDL fact set) learns to predict optimal distances.
3. At test time, the neural heuristic guides the symbolic planner's search — replacing or augmenting the hand-crafted heuristic.

**STRIPS-HGN** (Shen et al., 2020) demonstrated that hypergraph networks trained on planning instances generalize across instances, providing strong heuristic estimates with near-zero inference overhead compared to analytical heuristic computation.

*Reference:*\
Shen, William, Felipe Trevizan, and Sylvie Thiébaux. "Learning Domain-Independent Planning Heuristics with Hypergraph Networks." *Proceedings of ICAPS*, 2020. <https://ojs.aaai.org/index.php/ICAPS/article/view/6737> | Code: <https://github.com/ischubert/hgn>

### Neural MCTS — The AlphaGo/AlphaProof Pattern

**Monte Carlo Tree Search (MCTS)** with neural value and policy networks is perhaps the most successful pattern in modern neuro-symbolic AI (Silver et al., 2016).

The architecture, introduced in AlphaGo and refined through AlphaZero and AlphaProof, operates as follows:

**Four phases of MCTS:**

```
┌────────────────────────────────────────────────────┐
│                   MCTS Loop                        │
│                                                    │
│  1. SELECT        Walk tree from root using **PUCT**  │
│     (Symbolic)    (Predictor + UCT: policy prior P(s,a) weights exploration) until unexpanded node │
│                                                    │
│  2. EXPAND        Neural policy network proposes   │
│     (Neural)      candidate next moves/actions     │
│                                                    │
│  3. EVALUATE      Neural value network estimates   │
│     (Neural)      win probability / goal distance  │
│                                                    │
│  4. BACKPROPAGATE Update node statistics up tree  │
│     (Symbolic)                                     │
│                                                    │
│  Repeat until budget exhausted                     │
└────────────────────────────────────────────────────┘
                │
                ▼
    Select highest-visit-count action
                │
                ▼
    Symbolic verifier (correctness check)
```

The key insight: the **policy network** prunes the search tree (ignoring low-probability branches), while the **value network** provides a fast estimate of subtree quality (avoiding expensive rollouts to terminal states). The symbolic MCTS framework ensures that only valid moves are considered and that statistics are correctly maintained. The neural components provide guidance but never override correctness.

**In AlphaProof:** The "actions" are Lean 4 proof tactics. The "states" are proof contexts. The "value function" estimates how close the current proof context is to a complete proof. The verifier is Lean 4's type checker — a sound and decidable formal oracle (AlphaProof Team et al., 2024). For a full architectural treatment of AlphaProof, see Section 4.3.

**Generalization:** This pattern applies to any symbolic search problem where the search space is too large for exhaustive exploration, neural guidance can effectively prune the search, and a symbolic verifier can check move/action validity. Planning, theorem proving, program synthesis, and circuit design all fit this template.

### HyperTree Proof Search — The Bridge to AlphaProof

Between AlphaGo (2016) and AlphaProof (2024), the most important advance in neural theorem proving was **HyperTree Proof Search** (HTPS) by Lample et al. (NeurIPS 2022). HTPS established the feasibility of neural MCTS for formal mathematics at scale and shares key architectural elements with AlphaProof — online training from the search tree, hypertree state representation — though independent parallel development at DeepMind may account for the similarities.(Lample et al., 2022)

**Key innovations over standard Neural MCTS:**

1. **Online training from the search tree:** Rather than requiring a fixed training corpus of pre-existing proofs, HTPS trains the policy network *continuously* from the MCTS search tree itself. Every proof subtree explored — successful or not — becomes training data. This enables rapid adaptation to new problem distributions without expensive offline data collection.
2. **Proof exploration bonus:** An intrinsic reward for exploring novel proof states prevents the policy from cycling in proven sub-problems and encourages breadth across the proof space.
3. **Hypertree structure:** Unlike standard MCTS where each state has one parent, proof contexts can have multiple parents — the same subgoal may arise from multiple distinct proof paths. HTPS uses a hypertree (AND-OR graph) representation that correctly shares work across paths, avoiding redundant computation and properly crediting shared sub-proofs.

**Results:** Applied to the Metamath mathematical library (38,000 theorems), HTPS successfully proved **65.4%** of theorems with offline training — surpassing the prior state of the art of 56.5% achieved by GPT-f. With online training (continuously learning from failed proof attempts), HTPS reaches **82.6%**, dramatically exceeding all prior neural theorem provers.(Lample et al., 2022) This was the first demonstration that neural MCTS could handle the full diversity of a large mathematical corpus, not merely curated competition problems.

**Connection to AlphaProof:** AlphaProof adopts HTPS's online training principle and adapts the hypertree structure to Lean 4 tactics. The self-improvement loop described in §4.3.1 is directly built on HTPS's foundation. The key architectural evolution is replacing Metamath's simple rewrite rules with Lean 4's richer tactic language, requiring a more expressive policy network — but the fundamental MCTS + online training + hypertree structure carries forward unchanged.

*Reference:* Lample, Guillaume, et al. "HyperTree Proof Search for Neural Theorem Proving." *Advances in Neural Information Processing Systems (NeurIPS)* 35 (2022). <https://arxiv.org/abs/2205.11491> | Code: <https://github.com/facebookresearch/hypertree>

### Tactician — Supervised Tactic Learning for Coq

Parallel to HTPS on Metamath and AlphaProof on Lean 4, **Tactician** (Blaauwbroek et al., 2020) applies online tactic learning to the **Coq** proof assistant — a widely used formal verification system in both academia and industry (CompCert certified compiler, seL4 verified OS kernel).(Blaauwbroek et al., 2020)

Tactician's approach is simpler than MCTS-based methods: it uses **k-nearest-neighbor retrieval** over a database of previously successful proof steps, combined with online learning that continuously updates the tactic database as the user proves new theorems. When the user invokes `suggest` or `search`, Tactician proposes tactic sequences ranked by predicted success probability. No pretraining corpus of proofs is required — the system bootstraps from the user's own proof history.

**Significance:** Tactician demonstrates that useful neural theorem proving assistance can be achieved with lightweight methods integrated into an existing proof assistant workflow, without large-scale pretraining or MCTS infrastructure. This is the practitioner-accessible entry point to the neural MCTS ecosystem: a Coq user can install Tactician as an opam package and immediately benefit from learned tactic suggestions.

*Reference:*\
Blaauwbroek, Lasse, Josef Urban, and Herman Geuvers. "The Tactician: A Seamless, Interactive Tactic Learner and Prover for Coq." *Proceedings of CICM*, 2020. <https://arxiv.org/abs/2008.00120> | Code: <https://coq-tactician.github.io>

### LATS — Language Agent Tree Search

**LATS** (A. Zhou et al., 2024) is the most principled current synthesis of the Neural MCTS pattern with the LLM agent patterns introduced in Chapter 2 (ReAct, Reflexion). It uses MCTS as the planning backbone, with an LLM serving as all three essential roles simultaneously:

* **Policy network:** the LLM proposes candidate next actions (tool calls, reasoning steps, code)
* **Value function:** the LLM estimates the expected success probability of the current state given remaining steps
* **Reflective critic:** after a failed subtree, the LLM generates verbal feedback that penalizes similar future paths

The key innovation is the integration of *external feedback* — environment execution results, tool outputs, test failures — directly into the MCTS reward signal, creating a search that is grounded in actual task performance rather than a learned value approximation. This grounds the tree search in reality rather than in a potentially miscalibrated value model.

**Performance:** On HumanEval (programming), LATS achieves 92.7% pass\@1 — a substantial improvement over chain-of-thought (≈81%), Tree of Thoughts (≈88%), and ReAct (≈82%) — by replacing greedy decoding with systematic MCTS over the action space. Improvements on multi-hop QA (HotpotQA) and embodied navigation (ALFWorld) are task- and model-dependent; consult A. Zhou et al. (2024) for benchmark-specific figures.

**Practitioner note:** LATS trades latency for quality — each MCTS iteration requires multiple LLM calls. It is the right tool when task success rate matters more than response time and when the task has a verifiable success criterion such as code execution, fact verification, or plan validation.

*Reference:* Zhou, Andy, et al. "Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models." *Proceedings of ICML*, 2024. <https://arxiv.org/abs/2310.04406> | Code: <https://github.com/lapisrocks/LanguageAgentTreeSearch>

### Neural Combinatorial Optimization

The most widely deployed instance of neural subroutines in symbolic systems is not theorem proving or planning — it is **combinatorial optimization** for logistics and routing, where neural policies guide construction heuristics for NP-hard problems.

**Pointer Networks** (Vinyals et al., 2015) introduced the foundational pattern: a sequence-to-sequence model with an attention mechanism that "points" to positions in the input sequence rather than fixed-vocabulary outputs. This enables learning to solve combinatorial problems — Traveling Salesman (TSP), knapsack — where the output is a permutation of the input. Trained with supervised learning or reinforcement learning (REINFORCE), Pointer Networks produce near-optimal TSP solutions with inference time that scales as O(n²) in the number of nodes — a wall-clock advantage that grows substantially as instances scale beyond what exact solvers can handle in practical time budgets. On the 50-node instances studied in the original paper, Pointer Networks achieved near-optimal quality while running inference orders of magnitude faster than dynamic programming baselines (though not faster than optimized modern exact solvers like Concorde on such small instances).

**Attention Model** (Kool et al., 2019) scaled this to production relevance. Using a transformer-based encoder with a learned decoder, it achieves within 1% of optimal on TSP-100 and handles Vehicle Routing Problems (VRP) with capacities, time windows, and precedence constraints. Symbolic feasibility checking — verifying that a generated route does not exceed vehicle capacity and that all customers are served — is applied to all candidate solutions, discarding constraint violators before returning the result.

**The neuro-symbolic structure:**

```
Problem instance (graph)
      │
      ▼
  Neural policy
  (Attention Model)
      │ Candidate tour
      ▼
Feasibility checker   ← Symbolic oracle
(capacity, routes)
      │ Feasible tour
      ▼
Local search improvement
(2-opt, Lin-Kernighan)   ← Symbolic refinement
      │
      ▼
  Final solution
```

This three-stage pattern — neural generation, symbolic filtering, symbolic refinement — is directly analogous to AlphaCode's sample-filter-cluster loop and LLM-Modulo's generate-verify loop. The neural component handles combinatorial exploration; the symbolic components handle correctness and quality assurance.

*References:* Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer Networks." *Advances in Neural Information Processing Systems (NIPS)* 28 (2015). <https://arxiv.org/abs/1506.03134>

Kool, Wouter, Herke van Hoof, and Max Welling. "Attention, Learn to Solve Routing Problems!" *International Conference on Learning Representations (ICLR)*, 2019. <https://arxiv.org/abs/1803.08475> | Code: <https://github.com/wouterkool/attention-learn-to-route>

### GFlowNets — Diverse Combinatorial Generation

**GFlowNets** (Generative Flow Networks, Bengio et al., 2021) provide a theoretically grounded approach to diverse candidate generation for combinatorial problems.(Bengio et al., 2021) Where Pointer Networks and the Attention Model produce a single near-optimal solution via greedy construction, GFlowNets sample a *distribution* over solutions with probability proportional to a given reward function — the neural equivalent of a posterior sampler over the combinatorial space.

The neuro-symbolic structure is the same generate-filter pattern used in AlphaCode and FunSearch: the GFlowNet proposes a diverse pool of candidates; a symbolic feasibility checker and objective function score them; the best feasible candidates are returned. The key advantage over REINFORCE-based methods is theoretical: GFlowNets provably sample proportionally to reward, ensuring exploration does not collapse to a single mode. In practice, this means more chemically diverse drug candidates, more varied program hypotheses, and higher coverage of the Pareto frontier in multi-objective optimization.

**Applications:** Drug molecule generation (GFlowNet-guided generation of diverse candidates scored by molecular docking affinity), combinatorial protein design, and program synthesis — where diverse candidate programs improve the chance that at least one passes all test cases.

*Reference:* Bengio, Emmanuel, et al. "Flow Network Based Generative Models for Non-Iterative Diverse Candidate Generation." *Advances in Neural Information Processing Systems (NeurIPS)* 34 (2021). <https://arxiv.org/abs/2106.04399> | Code: <https://github.com/GFNOrg/gflownet>

### Neural SAT Solving

SAT solvers (used in bounded planning, verification, and formal methods) operate on combinatorially large search spaces. **NeuroSAT** (Selsam et al., 2019) trains a graph neural network to predict satisfiability and variable assignments from the clause graph structure of SAT instances, providing a neural "warm start" for SAT solving.

Subsequent work — including **NeuroBack** (Wang et al., 2022; arXiv:2110.14053), which integrates neural backbone-variable selection into the Kissat CDCL solver, and the **G4SATBench** evaluation framework (Li et al., ICML 2023) — has explored using neural predictions to warm-start variable ordering and unit propagation. Results vary significantly by problem family; the NeuroSAT paper itself acknowledges it is not competitive with state-of-the-art SAT solvers on industrial benchmarks, and neural guidance shows the most promise on structured synthetic instances.

*Reference:*\
Selsam, Daniel, et al. "Learning a SAT Solver from Single-Bit Supervision." *ICLR 2019*. <https://arxiv.org/abs/1802.03685> | Code: <https://github.com/dselsam/neurosat>

**SATNet** (Wang et al., 2019) takes a complementary approach: rather than training a neural network to *predict* satisfiability for an external solver, SATNet implements a **differentiable MAXSAT solver** as a neural network layer using a semi-definite programming (SDP) relaxation. This makes the SAT solver itself end-to-end trainable.

The key demonstration is **visual Sudoku**: a convolutional network perceives digits from images, SATNet solves the Sudoku constraints, and the full system is trained end-to-end on (image, solution) pairs without any intermediate supervision on individual digit labels. The constraint satisfaction is not post-processing — it is part of the forward pass. This positions SATNet as the constraint-satisfaction analogue of DeepProbLog's neural predicates: symbolic structure embedded inside the differentiable computation graph rather than layered on top of it.

> **SATNet → LDT: The progression toward tighter integration.** SATNet embeds the symbolic structure (MaxSAT/SDP) as a *layer* appended to the network. The Lattice Deduction Transformer (§4.4, Davis et al., 2026) takes the next step: the abstract domain of the constraint solver becomes the transformer's *own representation space*, with no neural/symbolic boundary at all. For practitioners who need stronger soundness guarantees and have a formal lattice structure available, §4.4 is the natural successor to the SATNet approach.

*Reference:* Wang, Po-Wei, et al. "SATNet: Bridging Deep Learning and Logical Reasoning Using a Differentiable Satisfiability Solver." *Proceedings of ICML*, 2019. <https://arxiv.org/abs/1905.12149> | Code: <https://github.com/locuslab/SATNet>

### Neural Algorithmic Reasoning: Learning to Execute Symbolic Algorithms

**Motivation**: The neural subroutines in §4.2.2 either learn *heuristics* for symbolic systems (STRIPS-HGN, neural MCTS) or learn to *solve* fixed combinatorial problems (Pointer Networks, SATNet). Neural Algorithmic Reasoning (NAR) addresses a different question: can a neural network learn to *execute* a classical symbolic algorithm — BFS, Dijkstra, sorting, dynamic programming — by treating the algorithm's data-flow structure as a training scaffold?

**The NAR framework** (Veličković et al., ICLR 2020):

* A GNN processes graphs whose node/edge features encode the algorithm's current state
* The GNN is trained to predict the algorithm's *intermediate* outputs at each execution step (not just the final answer)
* This step-level supervision enforces that the network learns the algorithm's *procedure*, not just an input-output shortcut
* At test time: the GNN runs on new graph instances and executes the algorithm in $O(|V|)$ steps

**Architecture**:

```
Input graph G = (V, E, h⁰)  [h⁰ = initial node/edge features]
       │
       ▼  Step 1
GNN layer ──► h¹  [predicted intermediate state after step 1]
       │
       ▼  Step 2
GNN layer ──► h²  [predicted state after step 2]
       │
       ⋮
       ▼  Step T
GNN layer ──► hᵀ  [output: final answer nodes/edges]

Loss: Σ_t L(hᵗ_GNN, hᵗ_algorithm)  ← step-level supervision
```

**The CLRS Benchmark** (Veličković et al., ICML 2022):

* 30 classical algorithms from the CLRS textbook (Cormen et al.) spanning sorting, searching, dynamic programming, graph algorithms, and geometry
* Provides a standardized evaluation framework: same GNN architecture, same train/test split, controlled graph sizes
* Best current systems (Triplet-GMPNN, Ibarz et al., 2022) achieve >80% accuracy on 24 of 30 tasks (per Table 1 in Ibarz et al., 2022; the same work reports over 20% improvement in average single-task performance over prior art)

**Key insight for NeSy**: NAR is a §4.2 pattern — the neural network *learns* the symbolic algorithm's behavior — but the symbolic algorithm's structure (intermediate states, invariants, loop structure) is the *training scaffold* that enables systematic generalization. Without step-level symbolic supervision, GNNs overfit to training graph sizes and fail on larger instances.

**Practical implications**:

1. NAR can serve as a fast approximate oracle for classical algorithms in symbolic planning systems — a 10× speedup over exact Dijkstra for approximately-correct shortest paths at test time
2. NAR generalizes better than neural heuristics (STRIPS-HGN) because it learns the full algorithm rather than just a value function

> \[!TIP] **When to use**: when you need fast approximate execution of a classical algorithm with known structure, and exact execution is a bottleneck; when you want systematic generalization across problem sizes via algorithmic scaffolding.

> \[!WARNING] **Limitations**:
>
> * GNNs trained on small graphs (n ≤ 16 nodes) do not always generalize to large graphs (n = 64+) even with step-level supervision
> * NAR learns to approximate the algorithm — it does not provide the algorithm's correctness guarantees
> * The symbolic algorithm must have a graph-representable state for GNN processing

*References:* Veličković, Petar, et al. "Neural Execution of Graph Algorithms." *International Conference on Learning Representations (ICLR)*, 2020. <https://arxiv.org/abs/1910.10593>

Veličković, Petar, et al. "The CLRS Algorithmic Reasoning Benchmark." *Proceedings of ICML*, 2022. <https://arxiv.org/abs/2205.15659> | Code: <https://github.com/google-deepmind/clrs>

Ibarz, Borja, et al. "A Generalist Neural Algorithmic Learner." *Learning on Graphs Conference (LoG)*, 2022. <https://arxiv.org/abs/2209.11142> (Triplet-GMPNN)

***

### Symbolic Regression — AI Feynman and PySR

**Symbolic regression** — discovering closed-form mathematical expressions from data — is the canonical example of the "Neural Helps Symbolic" paradigm applied to scientific discovery. The symbolic component is the expression search space; the neural component navigates it efficiently.

**AI Feynman** (Udrescu & Tegmark, 2020) uses physics priors as symbolic constraints to guide neural-based expression search.(Udrescu & Tegmark, 2020) Given a dataset of input-output pairs $(x\_i, y\_i)$ from a physical system, AI Feynman:

1. **Tests dimensional analysis** (symbolic): if $f(x)$ has physical units, candidate expressions must be dimensionally consistent — pruning the search space by orders of magnitude.
2. **Tests compositional symmetries** (symbolic): if $f$ is separable, translatable, or scale-invariant, decompose into simpler sub-problems.
3. **Fits neural networks** (neural): a small neural network fits the residual after each symbolic decomposition, identifying which sub-expression to target next.
4. **Searches the expression space** (symbolic): given the neural-identified structure, enumerate closed-form expressions using a Pareto-optimal approach balancing accuracy and complexity.

**Results:** AI Feynman rediscovered **100 equations** from the Feynman Lectures on Physics — including Newton's law of gravitation, the relativistic energy-momentum relation, and Maxwell's equations — from synthetic data alone. The physics-derived symbolic priors (dimensional analysis + symmetry tests) reduce the search time by 3–5 orders of magnitude compared to standard genetic programming.(Udrescu & Tegmark, 2020)

**PySR** (Cranmer, 2023) is the most practically deployed symbolic regression tool, combining evolutionary search over expression trees with GPU-accelerated fitness evaluation and a standard `scikit-learn`-compatible Python API.(Cranmer, 2023) PySR has been applied across astrophysics (discovering new scaling relations in galaxy formation simulations), genomics (identifying gene regulatory motifs), and materials science (learning inter-atomic potentials). It is the practitioner's entry point to neuro-symbolic scientific discovery: install, fit, and read the discovered equation.

```python
import pysr

model = PySRRegressor(niterations=40, binary_operators=["+", "*", "/", "-", "^"])
model.fit(X_train, y_train)
print(model.sympy())  # e.g., x0 * sin(x1) + x2^2 / 3.14
```

**Connection to FunSearch (§4.3.6).** Both PySR and FunSearch use neural guidance for symbolic space search and an exact evaluator as the fitness oracle. The difference is the search space: PySR searches mathematical expression trees; FunSearch searches Python programs. Both are instances of the same NeSy design pattern: neural generator + symbolic evaluator = structured scientific discovery.

*References:* Udrescu, Silviu-Marian, and Max Tegmark. "AI Feynman: A Physics-Inspired Method for Symbolic Regression." *Science Advances* 6.16 (2020): eaay2631. <https://doi.org/10.1126/sciadv.aay2631> | Code: <https://github.com/SJ001/AI-Feynman>

Cranmer, Miles. "Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl." *arXiv preprint* arXiv:2305.01582 (2023). <https://arxiv.org/abs/2305.01582> | Code: <https://github.com/MilesCranmer/PySR>

***

### Autoformalization — LLMs Translating Informal Mathematics to Lean 4

**Autoformalization** is the task of translating natural language mathematical statements into formal proof language (Lean 4, Isabelle, Coq) that a proof assistant can verify.(Wu et al., 2022) It is the critical preprocessing step in the AlphaProof pipeline (§4.3.1): without autoformalization, AlphaProof cannot convert competition problems into Lean 4 problems that its MCTS prover can attempt.

The architecture is a direct §4.2 instantiation: the **LLM** is the neural component that generates Lean 4 syntax from natural language; the **Lean 4 type checker** is the symbolic oracle that verifies whether the generated formalization is well-typed and semantically correct.

**Llemma** (Azerbayev et al., 2024) is the first open-source language model specifically trained for mathematical reasoning and autoformalization.(Azerbayev et al., 2024) Trained on the Proof-Pile-2 corpus (55 billion tokens of scientific papers, code, and formal mathematics), Llemma achieves state-of-the-art performance on MATH (competition-style math) and MiniF2F (formal theorem proving) among open models. The key finding: *mathematical pretraining improves both informal reasoning and formal autoformalization*, suggesting shared underlying representations between natural language mathematics and formal proof syntax.

**The autoformalization loop in practice:**

```
Competition problem (natural language)
            │
            ▼
  LLM (Llemma / Gemini)
  generates Lean 4 theorem statement
            │ candidate formalization
            ▼
  Lean 4 type checker (symbolic oracle)
  verifies: is this well-typed?
  does it capture the intended meaning?
            │ if rejected, iterate
            ▼
  Verified formal problem statement
            │
            ▼
  AlphaProof MCTS (§4.3.1) attempts proof
```

**Open challenge.** Autoformalization of *problem statements* (translating the question) is more tractable than autoformalization of *solutions* (translating a human proof sketch into a checkable Lean 4 proof). The latter requires filling in proof details that the human sketch elides — a task at the frontier of current LLM capability. The formal mathematics community tracks progress via the MiniF2F benchmark (Zheng et al., 2022), which contains 488 formally stated problems from AMC, AIME, and IMO competitions.

*References:* Azerbayev, Zhangir, et al. "Llemma: An Open Language Model for Mathematics." *Proceedings of ICLR*, 2024. <https://arxiv.org/abs/2310.10631> | Model: <https://huggingface.co/EleutherAI/llemma_34b>

Wu, Yuhuai, et al. "Autoformalization with Large Language Models." *Advances in Neural Information Processing Systems (NeurIPS)* 35 (2022). <https://arxiv.org/abs/2205.12615>

Zheng, Kunhao, et al. "MiniF2F: A Cross-System Benchmark for Formal Olympiad-Level Mathematics." *Proceedings of ICLR*, 2022. <https://arxiv.org/abs/2109.00110> | Data: <https://github.com/openai/miniF2F>

***


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://neurosymbolicai.gitbook.io/neuro-symbolic-ai-in-practice/part-iii-core-approaches/chapter-4/4-2-neural-helps-symbolic/4-2-neural-subroutines.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.