Who are you? The next Newton or something?: A Modest Proposal: Statistical Token Prediction Is No Replacement for Syntactic Construction

A Modest Proposal: Statistical Token Prediction Is No Replacement for Syntactic Construction

by Stephen Crowley

October 25, 2024

1Current Generative-Pretrained-Transformer Architecture

Given vocabulary $V$ , $|V| = v$ , current models map token sequences to vectors:

$\displaystyle (t_1, \ldots, t_n) \mapsto X \in \mathbb{R}^{n \times d}$

Through layers of transformations:

$\displaystyle \text{softmax} (QK^T / \sqrt{d}) V$

where $Q = XW_Q$ , $K = XW_K$ , $V = XW_V$

Optimizing:

$\displaystyle \max_{\theta} \sum \log P (t_{n + 1} |t_1, \ldots, t_n ; \theta)$

2Required Reformulation

Instead, construct Abstract Syntax Trees where each node $\eta$ must satisfy:

$\displaystyle \eta \in \{ \text{Noun}, \text{Verb}, \text{Adjective}, \text{Conjunction}, \ldots\}$

With composition rules $R$ such that for nodes $\eta_1, \eta_2$ :

$\displaystyle R (\eta_1, \eta_2) = \left\{ \begin{array}{ll} \text{valid\_subtree} & \text{if grammatically valid}\\ \emptyset & \text{otherwise} \end{array} \right.$

And logical constraints $L$ such that for any subtree $T$ :

$\displaystyle L (T) = \left\{ \begin{array}{ll} T & \text{if logically consistent}\\ \emptyset & \text{if contradictory} \end{array} \right.$

3Parsing and Generation

Input text $s$ maps to valid AST $T$ or error $E$ :

$\displaystyle \text{parse} (s) = \left\{ \begin{array}{ll} T & \text{if } \exists \text{valid AST}\\ E (\text{closest\_valid}, \text{violation}) & \text{otherwise} \end{array} \right.$

Generation must traverse only valid AST constructions:

$\displaystyle \text{generate} (c) = \{T|R (T) \neq \emptyset \wedge L (T) \neq \emptyset\}$

where $c$ is the context/prompt.

4Why Current GPT Fails

The statistical model:

$\displaystyle \text{softmax} (QK^T / \sqrt{d}) V$

Has no inherent conception of:

Syntactic validity
Logical consistency
Conceptual preservation

It merely maximizes:

$\displaystyle P (t_{n + 1} |t_1, \ldots, t_n)$

Based on training patterns, with no guaranteed constraints on:

$\displaystyle \prod_{i = 1}^n P (t_i |t_1, \ldots, t_{i - 1})$

This allows generation of:

Grammatically invalid sequences
Logically contradictory statements
Conceptually inconsistent responses

5Conclusion

The fundamental flaw is attempting to learn syntax and logic from data rather than building them into the architecture. An AST-based approach with formal grammar rules and logical constraints must replace unconstrained statistical token prediction.

Who are you? The next Newton or something?

Friday, October 25, 2024

A Modest Proposal: Statistical Token Prediction Is No Replacement for Syntactic Construction

1Current Generative-Pretrained-Transformer Architecture

2Required Reformulation

3Parsing and Generation

4Why Current GPT Fails

5Conclusion

No comments:

Post a Comment

The Devil’s Dice: How Chance and Determinism Seduce the Universe

Report Abuse