11
Recent advances in large language models have demonstrated remarkable reasoning capabilities through Chain of Thought (CoT) prompting, but often at the cost of excessive verbosity in their intermediate outputs, which increases computational overhead. We introduce Sketch-of-Thought (SoT), a novel prompting framework that combines cognitive-inspired reasoning paradigms with linguistic constraints to minimize token usage while preserving reasoning accuracy. SoT is designed as a flexible framework that can incorporate any custom reasoning paradigms based on cognitive science, and we instantiate it with three such paradigms - Conceptual Chaining, Chunked Symbolism, and Expert Lexicons - each tailored to different reasoning tasks and selected dynamically via a lightweight routing model. Through comprehensive evaluation across 15 reasoning datasets with multiple languages and multimodal scenarios, we demonstrate that SoT achieves token reductions of 76% with negligible accuracy impact. In certain domains like mathematical and multi-hop reasoning, it even improves accuracy while using significantly fewer tokens. Our code is publicly available: https://www.github.com/SimonAytes/SoT.
So this is what I’m excited about in AI.
LLMs are statistical machines that simply output reasonable sequences of tokens. Useful! Not particularly smart, but it approximates language. I think it proves that a great majority of what humans do is learned sequences of behaviors.
But now we’re working on corralling that statistical language into workflows that improve the reasoning of the output. These are the first experiments into what makes thinking actually work. Is it iteratively refining a rough concept (like we’re seeing in this paper)? Or is it subdividing tasks into more easily solved problems (like the Atom of Thoughts paper)?
Once we find something that works, a real theory of intelligence seems much more likely to emerge. If that happens, I wouldn’t be surprised to see LLMs die out in favor of something far simpler and more efficient.