Teaching Models to Reason Through Code

March 8, 2026 ● MacroDeep Research

A model that jumps directly from a bug report to a code change is brittle. A model that first reasons about what the bug is, where it lives, and what fix would be safe — that model generalizes. Teaching that reasoning is one of the hardest parts of training a capable coding model.

The case for thinking before coding

When an experienced engineer receives a bug report, they don't immediately open a text editor. They form a hypothesis. They trace execution. They consider what could go wrong with a naive fix. This process happens in their head, but it's real work — and it's why experienced engineers make better changes than junior ones, even when the junior engineer knows the syntax better.

We wanted Nico 2.5 to have a similar capability: to produce explicit reasoning before committing to a code change.

Training reasoning through distillation

We used a technique sometimes called “thinking trace distillation” — generating step-by-step reasoning from stronger teacher models, then training Nico 2.5 to produce similar chains of thought for the same problems.

The insight is that the reasoning itself is a kind of curriculum. When the model learns to articulate “the bug is likely here because the null check happens after the array access,” it's learning a pattern of fault localization that transfers across problems — not just memorizing a fix.

Multiple teacher models

Different teacher models have different reasoning styles — and that diversity is a feature, not a bug. A model that reasons through problems the same way every time will find certain problem shapes easy and others hard. By training on traces from multiple teachers, we expose Nico 2.5 to a variety of reasoning strategies and help it develop flexibility.

We found that the quality and depth of reasoning traces varied significantly between teachers, and that the variation itself was valuable for training robustness.

What we observe in practice

Models trained with reasoning traces show meaningfully better performance on problems that require multi-step analysis — particularly debugging scenarios where the error message is misleading or where the root cause is several function calls away from the symptom.

They also fail more gracefully. A model without reasoning training tends to produce confident but wrong answers. A model trained to reason will more often produce tentative answers with explicit uncertainty — which is far more useful in an agentic context where the model needs to decide whether to ask for more information or proceed.

Limitations

Reasoning traces add substantial cost and complexity to training. They also require careful quality control — a reasoning trace that leads to the right answer via wrong logic can mislead the model in subtle ways. We're still developing better methods for validating trace quality, and we expect this to be an active area of research for the next few model generations.

← Back to Research