A small Miami startup says it has done what every frontier AI lab has been chasing for nearly a decade. The company, called Subquadratic, claims its new SubQ model finally escapes the quadratic scaling wall that has bottlenecked transformers since 2017, and it has the eye-popping numbers (and skeptical critics) to match.
- Subquadratic says its SubQ 1M-Preview model reduces attention compute by nearly 1,000x at 12 million tokens compared with frontier models.
- The company raised $29 million in seed funding at a reported $500 million valuation, backed by Tinder co-founder Justin Mateen and others.
- AI researchers are split between calling the work meaningful progress and warning it could turn into another Magic.dev style letdown.
Why Quadratic Scaling Has Held AI Back
Transformers power almost every major language model on the planet, from ChatGPT to Claude. But they carry a built-in tax. Every token is compared against every other token, so as inputs grow, the number of interactions and the compute required to process them scales quadratically. That relationship has shaped what gets built, what systems cost, and where practical limits show up in real-world use.
In plain English, doubling the input quadruples the cost. That math is the reason long-context models get expensive fast, and the reason developers stitch together retrieval pipelines, prompt curation, and other workarounds to fit big jobs into small windows.
Subquadratic’s pitch is that those tricks are a dead end. The company’s approach, called Subquadratic Sparse Attention or SSA, is built on a straightforward premise: most of the token-to-token comparisons in standard attention are wasted compute. Instead of comparing every token to every other token, SSA learns to identify which comparisons actually matter and computes attention only over those positions. The selection is content-dependent, meaning the model decides where to look based on meaning, not on fixed positional patterns.
The Numbers That Have Everyone Talking
Subquadratic emerged from stealth on Tuesday with a sweeping claim that it has built the first large language model to fully escape the mathematical constraint that has defined every major AI system since 2017. The company says its first model, SubQ 1M-Preview, is the first LLM built on a fully subquadratic architecture, one where compute grows linearly with context length.
The headline figure is dramatic. At 12 million tokens, the company says, its architecture reduces attention compute by almost 1,000 times compared to other frontier models, a figure that, if validated independently, would dwarf the efficiency gains of any existing approach. Cost claims are just as bold. Subquadratic told SiliconANGLE that on the RULER 128K benchmark, SubQ scored 95% accuracy at a cost of $8, compared with 94% accuracy and about $2,600 for Claude Opus.
Alongside the model, the company is launching three products into private beta: an API exposing the full context window, a command-line coding agent called SubQ Code, and a search tool called SubQ Search.
Why Researchers Are Skeptical
Within hours of the launch, AI Twitter (and X) lit up. Critics zeroed in on the gap between what Subquadratic reported in its research paper and what the production model actually delivered. On MRCR v2, the company reported a research score of 83, but the third-party verified production model scored 65.9. That 17-point gap between the lab result and the shipping product is notable and largely unexplained.
Developer Stepan Goncharov called the benchmarks “very interesting cherry-picked benchmarks,” while another commenter described them as “suspiciously perfect.” Others questioned the model’s pedigree. Prominent AI engineer Will Depue initially noted that SubQ is “almost surely a sparse attention finetune of Kimi or DeepSeek,” referring to existing open-source models. Whedon confirmed this on X, writing that the company is “using weights from open-source models as a starting point, as a function of our funding and maturity as a company.”
Not everyone is dismissive, though. AI researcher John Rysana pushed back on the Theranos framing, writing that the work is “just subquadratic attention done well which is very meaningful for long context workloads,” and that “odds of it being BS are extremely low.”
The Ghost of Magic.dev Looms Large
Part of the skepticism is grounded in recent history. Magic.dev announced a 100-million-token context-window model in August 2024, with a claimed 1,000x efficiency advantage, and raised roughly $500 million on the strength of those claims. Two years later, evidence of that model in production has been thin.
Subquadratic does bring some credibility to the table. Its 35-person team includes 11 PhDs from Meta, Google, Oxford, Cambridge, and BYU. CEO Justin Dangel, a 5x founder, and CTO Whedon, formerly of Meta, both have a background in sparse attention. Still, no peer-reviewed paper, no open weights, and a small, hand-picked benchmark suite leave plenty of room for doubt.
What to Watch From Here
If SubQ holds up under outside scrutiny, the economics of long-context AI could shift in a hurry. Coding agents could load entire repositories, legal teams could feed in years of contracts, and developers could ditch fragile retrieval setups. If it doesn’t, Subquadratic risks becoming another cautionary tale about hype outpacing reproducible science. The next milestone to watch is a full technical report and outside labs rerunning the benchmarks. Until then, treat the 1,000x figure as a hypothesis, not a fact.
