← Blog · 2026-07-05 · seasonality · USDJPY · gotobi · Tokyo fix · backtest · case-study

The Tokyo Fix Anomaly: a Real Edge That Still Didn't Make the Cut

We tested the famous 'gotobi' USDJPY anomaly on 7 years of tick data. It's real — and we still rejected it. Here's the whole story, including the bug we found along the way.

Full interactive results: equity curves, drawdown & every candidate →

Every so often a trading anomaly comes with an actual paper trail. The gotobi effect is one of them: on Japanese calendar days divisible by 5 — the 5th, 10th, 15th, 20th, 25th and 30th — importers settle dollar-denominated invoices, and their banks buy USD ahead of the Tokyo fix at 9:55 JST. The result is systematic USDJPY buying pressure into the fix on those days, documented by Ito & Yamada in NBER WP 22820 and revisited in arXiv:2301.13204. The flow is commercial, not speculative — the classic argument for why an anomaly can stay alive after everyone knows about it.

That made it the perfect first test for our research pipeline: mechanical rules, an academic anchor, and a clock with no DST ambiguity (JST never shifts, so the fix is always 00:55 UTC).

The rules we tested

Trade only gotobi days; weekend dates shift to the preceding Friday, per Japanese settlement convention
Buy USDJPY N hours before the fix — N is the parameter we optimized
One trade per fix, long only
Hard exit at the fix, win or lose — the documented pressure ends there
Fixed stop loss (15–60 pips explored), 1% of equity risked per trade

How we tested it

Dukascopy tick data, real-tick modeling, $10,000 start, commission included at $3.50/side/lot. In-sample optimization ran on 2019–2022 only; we selected parameter plateaus (a candidate only counts if its neighbors are profitable too, not just the single best pass); and the 2022–2026 window stayed locked until candidates were frozen — then each got exactly one out-of-sample run.

Three candidates survived in-sample selection, all with net profit factors above 2.0. Then the out-of-sample window delivered its verdict:

Candidate	In-sample 2019–22	Out-of-sample 2022–26	Outcome
Enter 4h before fix	PF 2.10, +$1,295	PF 0.79, −$1,130	Collapsed
Enter 9h before fix	PF 2.04, +$1,668	PF 1.36, +$2,171	Real edge
Enter 8h, tight stop	PF 2.10, +$4,125	PF 1.03, breakeven	No edge after costs

Two candidates died. That's the headline.

All three sat on genuine in-sample plateaus with profitable neighborhoods. Robustness checks passed. And still, the 4-hour entry — the best-looking region by fitness — lost money on four years of unseen data:

The 4h-entry candidate out-of-sample: what optimization residue looks like

In-sample robustness is necessary, but it is nowhere near sufficient. If we had published the 4h-entry backtest in March 2022, it would have looked spectacular — and then quietly bled for four years.

One candidate held — and we rejected it anyway

The 9-hours-before-fix entry kept a real edge out-of-sample: profit factor 1.36 across 283 trades, +21.7% over four years at 1% risk per trade, maximum drawdown just 6.8%, still profitable with commission stressed to $8.50/side. Part of the return is even structural — long USDJPY carries positive overnight swap, which subsidizes the pre-fix hold.

The 9h-entry candidate out-of-sample: a real, if modest, edge

So why reject it? Our gates require a candidate to keep at least 70% of its in-sample profit factor out-of-sample. This one kept 67%. Close — but lowering a bar after seeing the result is just data snooping with extra steps. The asymmetry is the whole point: skipping a marginal edge costs nothing, while going live on optimization residue costs real money.

Instead, the surviving configuration moved to a demo forward test — the one dataset nobody can overfit. If live-forward months keep tracking the out-of-sample profile, it earns another look.

The bug we found (and why we're telling you)

Midway through this study, a code review caught a position-sizing bug: the tester's tick-value data for our custom symbols was stale, so trades were risking roughly 1.5–2× the intended 1%. We fixed the sizing to use the tester's own profit calculator and re-ran the entire cycle from scratch — same conclusion, honest numbers. Every out-of-sample run is counted and disclosed (this strategy consumed 6, including the re-run). Transparency about process is the only thing that makes backtest numbers worth reading.

What we took away

The anomaly is real. Every configuration was profitable in-sample, net of costs, consistent with the literature.
Out-of-sample discipline is the product. It killed two of three "robust" candidates before they could cost anything.
A near-miss isn't a fail-and-forget. The 9h entry is on a live demo forward test — evidence that no backtest can fake, at a price of zero.

The full interactive report — equity curves, drawdowns, yearly breakdowns for all three candidates — is on the results page.

Questions about the method? The pipeline (headless MT5 testing, plateau selection, one-shot out-of-sample, cost stress) is being built in public — we're happy to talk shop.

Full interactive results: equity curves, drawdown & every candidate →