AutoFormalization

The unsolved subproblem at the heart of LLM-to-symbolic neurosymbolic AI

Converting natural-language statements into faithful formal representations a solver can process.

What it is

AutoFormalization is the task of converting an informal, natural-language statement — a math problem, a planning goal, a constraint specification, a legal rule — into a formal symbolic representation a downstream solver can process. Common targets include first-order logic, the Planning Domain Definition Language (PDDL), the satisfiability-solver format SMT-LIB, and domain-specific theorem-proving languages such as Lean or Coq.

In the neurosymbolic cycle, autoformalization is the gateway step for the LLM → Symbolic pattern: an LLM acts as the translator and a dedicated solver does the reasoning. The solver only delivers guarantees if the translation is faithful.

Why it matters

Autoformalization is where the determinacy of the symbolic side meets the indeterminacy of the LLM side. The LLM can hallucinate the formal representation as easily as it can hallucinate a wrong direct answer — a misnamed predicate, a mis-quantified variable, a wrong type signature — and the solver returns confidently wrong output that looks rigorously derived.

Yang et al.’s 2025 NSAI survey flags consistency and efficiency of autoformalization as the central open challenges in the LLM→Symbolic pattern. Active research targets include first-order logic, mathematical statements and proofs, PDDL, and symbolic world models.

For practitioners deploying NSAI pipelines, autoformalization is the failure surface to instrument and verify. Every downstream guarantee is conditional on this step.

What it is

Why it matters

Related concepts