Anthropic shipped Claude Opus 4.7 on 16 April 2026 and
pushed three breaking API changes at the same time: non-default
temperature, top_p, top_k now
return HTTP 400; the old thinking.budget_tokens
interface is gone in favour of adaptive thinking; and assistant
message prefill is no longer accepted. A new tokenizer inflates the
same text by 1.0–1.35×, and the default
behaviour is now literal instruction following—the
model will not silently generalise or fill gaps the way 4.6 did.
FDRP is a method catalogue built across years on Opus 4.6. A
behavioural shift that size deserves an audit. Run #1028 ran that
audit head-to-head against 4.6: six HIGH-priority methods, same
prompt file, same file-based dispatch via claude -p with
< /dev/null, 180 second budget per call.
BLUF. Net verdict is neutral with selective improvement. 4 of the 6 tested methods produced clean paired output. 3 improved on 4.7 (Six Thinking Hats, Bridges, Triage Cascade). 1 was structurally equivalent (Fractal-Out S0–S6). 1 broke on 4.7 only (DDn — completion timeout, not a content regression). 0 methods retired.
What changed in 4.7
The behavioural shifts matter more than the API breakages for us. We were worried about three things in particular:
- Literalism. 4.7 “will not infer requests you didn’t make.” Methods like Bridges (cross-domain absorption) depend on the model making analogical leaps the prompt did not spell out. Literalism could suppress that.
- Fewer subagents by default. PEAR and expert expansion both depend on the model spawning specialists. If 4.7 under-spawns, those methods degrade silently.
- Fewer tool calls by default. Wave-based methods (Fractal-Out, concept2X) need aggressive tool use. Under-calling would compress the method’s coverage.
The way to answer is not to reason about it; it is to run the prompts and measure the diff.
Liviu-named methods: Bridges and Fractal-Out
Two methods were named directly by Liviu and were therefore the single most important checks: Bridges (method #7, cross-domain absorption) and Fractal-Out S0–S6 (method #8, scale stress testing). Both are explicit bets on 4.6’s inferential looseness. If literalism broke them, the catalogue had a real problem.
Bridges — SHARPER on 4.7
Prompt: “Name 5 adjacent fields that have solved a structurally identical problem” (the concrete instance was Kaggle PS-S6E4 irrigation-need prediction with OOF-LB divergence). 4.6 returned a broad-field list with foundational citations — Tobin 2017 domain randomization, Heckman 1979, Stuart 2011 IPSW, Blöschl 2013, Long 2015 MMD. Good, but generic.
4.7 returned a deeper-specialist list with operator-level prescriptions: Siddiqi Intelligent Credit Scoring, Pan & Yang TKDE 2010, FAO-56 Penman-Monteith (Allen 1998), Makridakis M5 IJF 2022, Elkan IJCAI 2001 — and named executable operators like “adversarial-weight CV”, “ET0 as monotone constraint feature in LightGBM”, and “isotonic-over-pinball at multiple τ”. 13 % shorter in words, substantially more useful per word. Zero hallucinated references on spot-check.
Literalism did not kill Bridges. If anything, it tightened it: 4.7 delivers exactly what you ask for (5 fields, clean numbering, domain specialisation) without padding. Prompt-rework recommendation: none.
Fractal-Out S0–S6 — 4.7-SAFE
Prompt: stress-test a concept (“PIO-router with deterministic O(1) dispatch”) at all scales S0 (solo user) through S6 (civilizational). One sentence per scale on whether it still holds, what breaks first, and the minimum rearchitecture needed.
The literalism concern was specific here: would 4.7 skip scales if it didn’t “need” to answer them? The answer was no. Both models addressed all seven scales cleanly. The only disagreement was architectural: 4.6 placed the first failure at S3 (~500 agents, classification ambiguity + capability drift) while 4.7 placed it at S4 (continental, specialist scarcity + cross-region fairness). That is a legitimate architectural disagreement, not model drift — 4.7 is more optimistic about hierarchical federated routers at S3.
As a side observation, 4.7 self-reports its word count at the end of the output (“(348 words)”) when the prompt includes a length constraint. 4.6 doesn’t. A mild literalism signal, but harmless. Prompt-rework recommendation: none.
The one regression: DDn
Double-Diamond-to-the-nth is method #1. It is a recursive Discover/Define/Develop/Deliver expansion where each phase can itself be a nested diamond. On 4.6 it runs in 42 seconds with a clean 391-word output covering all phases and one level of recursion. On 4.7 it timed out at 180 seconds on two independent attempts, producing zero bytes.
This looks like a generation-length regression, not a reasoning one. The hypothesis is that 4.7’s literalism plus adaptive thinking causes it to expand every nested phase more aggressively than 4.6 did, pushing past the completion budget. The fix is prompt-level, not method-level:
- Cap phase word budget explicitly (“≤40 words per phase, total ≤400 words”).
- Flatten the recursion into two dispatches: depth-1 first, then a targeted depth-2 on the critical phase.
- Opt in to the
task-budgets-2026-03-13beta header (minimum 20k budget) to extend the completion window.
This goes on the rework list as designs/canonical-prompts-v2/P1-ddn.md.
Until the rewrite lands, DDn in production is routed
through 4.6.
Full method health table
| Method | 4.6 verdict | 4.7 verdict | Rework? |
|---|---|---|---|
| Bridges (#7) | validated | sharper | no |
| Fractal-Out S0–S6 (#8) | validated | validated | no |
| Six Thinking Hats (#5) | validated | sharper | no |
| Triage Cascade (#21) | validated | sharper | no |
| DDn (#1) | validated | needs rework | yes — prompt |
| Pre-Mortem (#6) | prompt-bug | prompt-bug | yes — unrelated to 4.7 |
Pre-Mortem is worth a note: both models timed out. The prompt asks for 10 failure modes with 3 sub-items each — 30 structured paragraphs in one shot. That exceeds single-shot completion speed independent of model. The fix is to chunk it: “top 5 severity-weighted failure modes” then “next 5”, two dispatches. This has been on the list since 4.6 and was caught in R1028 as a side effect.
Cross-cutting patterns
Averaging across the 4 method-pairs where both models completed, 4.7 output is ~12 % shorter (370 vs 420 words) but carries more citations, more technical symbols, and more doctrine-integrated references (BIND-IDs) per word. Denser, not dumber. Anthropic’s stated “more direct tone, less validation-forward phrasing” shift is visible in the diffs: phrases like “This feels like insurance theater” became “smells like ceremony over substance” — same content, fewer hedges.
Format discipline is another visible shift: 4.7 auto-applies markdown hierarchy (H1/H2) on every output. 4.6 used bold-prefix paragraphs instead. Neither format was requested. For pipeline parsing, this is a small but real win.
What this test did not probe: multi-turn tool-use loops, wave-dispatch, and the reported reductions in subagent spawning. All six prompts were single-shot pure-reasoning. The next wave of testing needs tool-using prompts and an explicit PEAR-panel dispatch to close that gap.
What we’re doing with the result
- Upgrade to 4.7 as default for FDRP reasoning work. The ~12 % length reduction plus the quality gains on Bridges, Six Hats, and Triage are worth the 1.0–1.35× tokenizer inflation.
- Route DDn through 4.6 until the
rewritten prompt (
P1-ddn.md) lands. - Chunk Pre-Mortem into two dispatches regardless of model.
- Raise
max_tokensto ≥64k atmax/xhigheffort per Anthropic’s migration guide. - Remove sampling-parameter workarounds
everywhere (already automated via the bundled
claude-apimigration skill). - Book a next-wave test covering PEAR, Kaggle-GM approach, generative kernel injection, and at least one tool-using prompt to measure the subagent/tool-call axes this round missed.
The broader point: FDRP is meant to survive model turnover. Methods that depend on a specific model’s quirks are fragile. Methods that work on both 4.6 and 4.7 — and most of ours do — are what the architecture is supposed to produce. Run #1028 is the first reflexive check of that hypothesis against a live model migration. The catalogue survived.