Science is drowning in its own output

The global scientific community now produces over four million peer-reviewed papers per year. A researcher aiming to conduct a rigorous Systematic Literature Review in any mature field must assess thousands of papers — a task that currently takes six to eighteen months of human effort for a single comprehensive review.

AI tools exist to help. But they all share a fundamental architectural ceiling: they can only see a fixed-size window of text at once. Feed them too much, and they lose information. They truncate. They summarize away nuance. They hallucinate synthesis. The root cause is not intelligence. It is memory architecture.

Two prominent approaches have emerged to cope with these constraints. Retrieval Augmented Generation selects the chunks it thinks are relevant and discards the rest — a bet that relevance can be determined before synthesis. Context compaction techniques compress information through summarization, offering efficiency at the cost of detail. Both are fundamentally lossy. Both assume that portions of the input matter less than others and can be safely omitted.

Examining these approaches through the lens of the bitter lesson — the observation that methods which leverage scale and learning consistently outperform those built on human-engineered heuristics — suggests we need a different abstraction entirely. Not one that helps the model cope with its limitations, but one that removes them.

A different abstraction

Recursive Language Models offer a way out. RLMs treat the user prompt as part of an external environment that the model can control programmatically. Their defining characteristic is the ability to manipulate input through symbolic handles without copying text into the context window, enabling models to recurse on portions of the input from outside.

In practice, this means providing the model with a persistent code environment where the initial prompt is loaded as a variable. The model generates code to understand and transform chunks of the input, building intermediate values and the final response into new variables, potentially invoking sub-RLMs within loops. The model never needs to hold the full input in memory — it operates on it.

RLMs have shown strong results on long-context benchmarks, and their application to literature synthesis is particularly promising. But the original work studied them primarily through prompting and distillation, with restricted recursion depth and synchronous execution. Three specific limitations constrain their real-world applicability:

Training methodology. Prompting prevents models from learning from mistakes. Distillation only transfers existing behavior from a teacher model, without enabling the student to discover strategies the teacher never exhibited.
Recursion depth. The original work restricts maximum depth to one. Following the general principle of the approach, models should determine optimal recursion depth based on task requirements.
Synchronous execution. All sub-calls run sequentially, leading to prohibitive inference times. Allowing models to choose between synchronous and asynchronous calls could yield substantial efficiency gains.

Our contribution: reinforcement learning for RLMs

We introduce the first application of reinforcement learning to train Recursive Language Models, using Reinforcement Learning from Verifiable Rewards with the Dr. GRPO algorithm and Low-Rank Adaptation as the primary training mechanism.

RL training unlocks two capabilities that prior approaches could not achieve. First, models that explore and learn from their own decomposition mistakes — discovering recursive strategies that never existed in any training data. Second, cost-aware behavior where the model learns to balance synthesis quality against computational expenditure, producing an efficiency profile that emerges from training rather than human engineering.

LoRA is central to this work: not only as a practical constraint for accessible deployment, but as an experimental lens to study how little parameter movement is sufficient to unlock robust recursive behavior. If recursive reasoning is the right abstraction for scaling beyond context windows, it should not stay locked behind proprietary models and prohibitive compute budgets. The core capability ships as lightweight adapter weights that run on commodity hardware.

The bottleneck is never the intelligence of the analyst. It is always the number of documents any single person — or any single AI call — can hold in mind at once. We remove that ceiling entirely.

First deployment: systematic review at any scale

The first application of this technology is systematic literature review. An SLR conducted with our system processes every paper in the corpus fully — no truncation, no sampling, no information loss. Each document is handled by a sub-agent. The system produces an auditable execution trace: deterministic, programmatic, reproducible. And because it learns through reinforcement, it becomes more efficient with every deployment.

No current AI tool for literature review offers this combination. Existing systems degrade beyond approximately 50,000 tokens of effective quality, hit hard limits around 200,000 tokens, and require human curation of what to include in the context. Our architecture is unbounded by design.

Current AI approaches RL-trained RLMs
Input scale Hard limit ~200K tokens; quality degrades beyond ~50K Architecturally unbounded
Information loss Truncation and summarization discard evidence Zero loss: every document fully processed
Cost control Costs scale with corpus, no native optimization Cost-aware reward training minimizes compute per task
Reproducibility Sensitive to prompt design and document ordering Deterministic, auditable execution trace
Improvement Static; bounded by distillation data Learns from experience via reinforcement

Where this matters

The immediate applications sit at the intersection of large literature corpora and high-stakes decision-making.

Clinical evidence synthesis. Regulatory bodies and hospital systems require systematic evidence reviews before guideline changes. Current timelines of twelve to eighteen months create dangerous lags between research findings and clinical practice. Organisations like Cochrane produce fewer than 200 full reviews per year due to capacity constraints.

Life sciences and pharma R&D. Literature review is a critical early step in target identification. The AI-in-research market for life sciences holds the dominant share of dataset licensing revenue. Faster, complete synthesis directly reduces time to candidate.

Academic research infrastructure. Universities, grant-writing teams, and funding bodies spend significant budget on literature synthesis. The $19 billion academic publishing industry is actively adopting AI tools. An SLR system with no information ceiling addresses a gap none of them currently fill.

Patent and technology intelligence. IP firms and corporate R&D labs commission exhaustive prior art searches. RLMs can cover every filing in a domain without sampling bias, at a fraction of analyst cost.

Why this is hard to replicate

This is not a wrapper around a foundation model. The work represents genuine research-level advances with durable competitive properties.

We are the first to train RLMs with reinforcement learning. Distillation — the only existing alternative — cannot discover strategies absent from the teacher model's behavior. Our cost-aware behavior is learned, not engineered: the model discovers how to minimize compute while maximizing accuracy, producing strategies that no human prompt engineer would design.

The training infrastructure itself is a standalone asset. Constructing verifiable long-context tasks at 100K to 2M token scale requires significant pipeline investment that is not trivially replicated. And because the capability ships as LoRA adapters — not monolithic models — deployment costs are dramatically lower than competitors requiring proprietary model serving infrastructure.

Open weights build community trust and research adoption. The surrounding tooling, domain-specific fine-tuned variants, and enterprise integrations form the commercial layer that converts open-source adoption into revenue.

What comes next

The research component is underway. The validated training pipeline produces open-weight RLM adapters with externally verifiable benchmarks. Over the coming months, we deploy a working SLR system on a target domain, run pilot programmes with academic institutions and research hospitals, and move toward a commercial beta with API and web interface.

The scientific and commercial opportunity is real. The technical moat is defensible. And the research is already producing results. What comes next is the transition from validated findings to deployed product — faster drug discovery, safer clinical guidelines, better-informed policy, and a commercially viable system built on infrastructure that scales with the world's growing knowledge base rather than being bounded by it.

If your work is bottlenecked by the scale of existing literature, we want to hear from you.

Get in touch