Sakana AI Bets on Agent Orchestration Over Frontier Models
Data Breach TodayArchived Jun 23, 2026✓ Full text saved
Fugu Uses Multiple Agents and Models to Rival GPT-5.5, Mythos Japanese startup Sakana AI is challenging the assumption that bigger models always win. Its new Fugu orchestration system combines agents and multiple AI models to deliver frontier-level performance, offering enterprises an alternative to relying on a single proprietary AI provider.
Full text archived locally
✦ AI Summary· Claude Sonnet
Artificial Intelligence & Machine Learning , Next-Generation Technologies & Secure Development
Sakana AI Bets on Agent Orchestration Over Frontier Models
Fugu Uses Multiple Agents and Models to Rival GPT-5.5, Mythos
Emilia David • June 22, 2026
Share Post Share
Credit Eligible
Get Permission
Image: Sakana.ai
Collective action beats a monolith, say independent artificial intelligence companies narrowing the gap between open-weight and proprietary frontier models through an approach that harnesses the power of collective action.
See Also: Edge Transformation: Top 5 SASE Predictions and Trends
Japanese company Sakana AI says it has created a system with capabilities equivalent to Anthropic's Mythos preview and OpenAI's GPT-5.5. It claims that its new model orchestration API, Fugu, can match the performance of top models by relying on a system of agents and models to complete tasks, rather than depending on a singular powerful model.
Fugu is not a typical open-source large language model designed to do everything, or anything. Rather, it is what Sakana AI calls an "orchestration model" that uses a multi-agent system that behaves like a single LLM. When users ask Fugu to do something, the system decides how to handle the request via a team of agents. If the task requires a more complicated response, Fugu taps different models.
Sakana AI said this kind of collective intelligence is a hedge against overreliance on a single company and a way for stronger capabilities to emerge from smaller companies. An orchestration model that taps into several models could prevent disruptions that enterprises have no control over, such as the recent U.S. government decision to impose export controls on Fable 5 and Mythos 5 (see: US Anthropic Export Controls Sparks Sharp EU Reaction).
"Recent disruptions in the AI landscape have demonstrated the severe risk of single-vendor dependency," Sakana said in a blog post. "For an organization or a nation, relying on a single company's APIs for critical infrastructure, finance, or governance is a material vulnerability. This risk is no longer a hypothetical possibility, but a reality."
Sakana offers two tiers of Fugu: the base Fugu, which the company said balances performance and low latency and works best for coding tasks, and Fugu Ultra, for solving multi-step problems and tasks requiring accuracy and depth. Fugu Ultra "stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview across the industry's most rigorous engineering, scientific and reasoning benchmarks."
Large companies with massive research budgets, such as Anthropic, OpenAI and Google, have an advantage in building highly capable models. While the past few years have shown that open-source projects are now rivaling the performance of frontier models - as evidenced by the rise of DeepSeek R1, the Qwen models from Alibaba and Mistral's family of models - many open-weight models struggle without access to larger and larger amounts of data, compute and more extensive post-training alignment. Resources that many closed-source model providers can easily access to raise capital.
Enterprises also believe that open-source projects do not have the same level of model provenance, data handling and auditability. Sakana argues that the Fugu system sidesteps that problem. Instead of forcing enterprises to choose a model that fits their needs and decide between open and closed systems, it offer a way to coordinate existing models. And through that, collective intelligence becomes more capable than a single model.
According to Sakana's benchmark testing, Fugu Ultra scored 93.2 on LiveCodeBench, beating Fable 5's 89.8, Gemini 3.1's 88.5 and the 85.3 logged by GPT-5.5. For SWEBench Pro, only Fable 5 scored higher than Fugu Ultra. Of course, these are scores reported by the company and are not independently verified.
Early users on social media have mixed responses. Wharton professor Ethan Mollick said he found Fugu Ultra slow but ultimately fine, while others praised the novelty of the company's approach.
"This is generally how applied AI products are building their agent harnesses at this point, but the idea of making this an LLM that any developer can interact with is also a great idea. As we get more innovation with both frontier closed and OSS models, there's going to be a ton of value produced for the layer that can route the best," Box CEO Aaron Levie said in a post on X.