Streaming experts
Simon Willison
Archived Mar 24, 2026
✓ Full text saved
I wrote about Dan Woods' experiments with streaming experts the other day , the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process. Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today @seikixtc reported running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on
✦ Summarize
☆ Save
Full text archived locally
Simon Willison’s Weblog
Subscribe
Sponsored by: WorkOS — The infrastructure fast-growing B2B companies use to sell to Enterprise.
I wrote about Dan Woods' experiments with streaming experts the other day, the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process.
Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today @seikixtc reported running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on an M2 Max MacBook Pro.
And @anemll showed that same Qwen3.5-397B-A17B model running on an iPhone, albeit at just 0.6 tokens/second - iOS repo here.
I think this technique has legs. Dan and his fellow tinkerers are continuing to run autoresearch loops in order to find yet more optimizations to squeeze more performance out of these models.
Posted 24th March 2026 at 5:09 am
Recent articles
Experimenting with Starlette 1.0 with Claude skills - 22nd March 2026
Profiling Hacker News users based on their comments - 21st March 2026
Thoughts on OpenAI acquiring Astral and uv/ruff/ty - 19th March 2026
This is a note by Simon Willison, posted on 24th March 2026.
definitions 50 ai 1927 generative-ai 1709 local-llms 148 llms 1675 qwen 52 kimi 9 autoresearch 3
Monthly briefing
Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.
Pay me to send you less!
Sponsor & subscribe
Disclosures Colophon © 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026