Recursive self-improvement, explained — plus Anthropic's bet on a chip that thinks in RAM

And top remote AI jobs from Micro1, Alignerr Language Solutions and more.

May 05, 2026

Recursive Self-Improvement is when an AI system becomes capable of improving the very process that makes AI systems better — writing better training code, tuning its own hyperparameters, designing its own evals, and eventually training a smarter successor model, all without a human in the loop. It's the concept that takes "AI as a tool" and replaces it with "AI as a participant in its own development." And right now it's not theoretical: Anthropic co-founder Jack Clark just published an essay putting 60%+ odds on AI systems training their own successors before 2029 — built entirely from public benchmark data, no inside information required.

Let's understand the concept a little better first.

1. AI CONCEPT IN 3 MINUTES

Recursive Self-Improvement (RSI)

Recursive Self-Improvement is when an AI system can make AI systems better — including itself. Instead of humans designing the training runs, writing the reward functions, and running the evals, an AI does those jobs, and the output is a smarter AI that can do them even better. Each cycle produces an input that improves the next cycle.

How It Works

Automate the R&D Loop: The core AI research pipeline has four steps: generate hypotheses, run experiments, evaluate results, update the training process. RSI means replacing one or more of these steps with an AI agent rather than a human researcher. You don’t need all four automated to get a meaningful capability jump — even automating step three (evals) changes what’s possible.

The Feedback Loop: The improved model is used to run the next round of improvement. This is the “recursive” part — each generation of model helps train the next one. A 2x improvement compounded over five cycles is 32x total, not 10x.

Capability Overhang: RSI doesn’t require a model to be superhuman at everything. It only requires it to be superhuman at the specific task of improving models — a much narrower bar. Clark’s essay shows AI is already at 25–28% of human researcher performance on post-training tasks. The bar doesn’t need to be 100% to trigger compounding.

The Time Horizon Test: METR’s measure of how long a task an AI can autonomously complete at 50% reliability is the clearest indicator of RSI proximity. When that number reaches “weeks” rather than “hours,” an AI can complete a full training run end-to-end without a human checkpoint. We’re currently at 12 hours.

Example

Traditional loop (human-driven):

Researchers design experiment → run training → evaluate → redesign
Duration per cycle: weeks to months
Bottleneck: human researcher time

RSI loop (AI-driven):

AI agent designs experiment → runs training → evaluates → redesigns
Duration per cycle: hours to days
Bottleneck: compute, not human time

What this means concretely: Anthropic’s internal benchmark shows AI going from a 2.9x speedup on an LLM training optimization task (May 2025) to a 52x speedup by April 2026 — while a human researcher takes 4–8 hours to hit 4x. The gap is no longer close.

RSI is not about a single “AGI moment” — it’s a gradual handoff of the AI R&D pipeline from human researchers to AI agents, and by Clark’s own benchmarks, that handoff is already underway.

2. TOP 3 DEVELOPMENTS

Anthropic Co-Founder Jack Clark Puts 60%+ Odds on AI Training Its Own Successors by 2029

In his newsletter Import AI, Anthropic co-founder Jack Clark published an essay putting 60% odds on AI systems training their own successors — with no human involvement — by the end of 2028, and 30% odds by end of 2027, building his case entirely from public benchmark data rather than any inside information. The benchmark evidence spans every stage of the AI R&D pipeline. On software engineering: SWE-Bench moved from Claude 2 at ~2% in late 2023 to Claude Mythos Preview at 93.9% — effectively saturating the benchmark — while CORE-Bench for reproducing research paper results went from 21.5% to 95.5% in 15 months. On autonomous work duration: METR's time-horizon measure, tracking how complex a task AI can complete at 50% reliability, climbed from roughly 30 seconds with GPT-3.5 to 12 hours with today's frontier models; METR researcher Ajeya Cotra considers 100-hour runs plausible by the end of 2026. The most striking internal datapoint: Anthropic's internal LLM training optimization task saw AI go from a 2.9x mean speedup (Claude Opus 4, May 2025) to a 52x mean speedup by April 2026, while a human researcher requires 4–8 hours just to reach a 4x speedup on the same task. On post-training: as of March 2026, AI systems can post-train open-weight models to capture about half the uplift a human researcher achieves — with top systems (Opus 4.6, GPT-5.4) scoring 25–28% against a human baseline of 51%. OpenAI is meanwhile targeting an automated AI research intern by September 2026.

Details & Specifications

Benchmark trajectory:

SWE-Bench: 2% (Claude 2, 2023) → 93.9% (Mythos Preview, 2026); CORE-Bench: 21.5% → 95.5% in 15 months; MLE-Bench (Kaggle): 16.9% → 64.4%; PostTrainBench: AI at ~25–28% vs human score of 51%

METR time horizons:

30 seconds (GPT-3.5, 2022) → 12 hours (frontier models, 2026) → 100 hours projected by end 2026

Anthropic internal task:

LLM training optimization speedup: 2.9x (Opus 4, May 2025) → 52x (April 2026); human baseline: 4x in 4–8 hours

Industry signals:

OpenAI targeting automated research intern by September 2026; Anthropic publishing proof-of-concept for automated alignment researchers; startups like Recursive Superintelligence explicitly targeting self-improvement

Clark's odds:

30% by end-2027; 60%+ by end-2028; full essay published in Import AI issue 455

Sierra Raises $950M at $15B Valuation — 40% of Fortune 50 Now Running on Its AI Agents

Bret Taylor's AI startup Sierra raised $950 million in a Series E round led by Tiger Global and GV, pushing its post-money valuation above $15 billion and giving the company more than $1 billion in capital to pursue what it calls the "global standard for AI-powered customer experiences." The growth numbers behind the raise are remarkable: Sierra reached $150 million in annual recurring revenue in just eight quarters — a milestone the company claims no traditional software firm has ever hit so quickly — with penetration across more than 40% of the Fortune 50, including Prudential, Cigna, Blue Cross Blue Shield, and Rocket Mortgage. The practical performance metrics from live deployments are what have won enterprise trust: Nordstrom launched its voice agent on Sierra in just five weeks; Singtel achieved over 70% autonomous resolution rates within 10 weeks of deployment. The strategic upside Sierra is selling into is enormous: the global customer service market represents roughly $400 billion annually, with an increasing share flowing toward AI-powered agents — and Taylor is explicitly framing this as a winner-take-most category, calling Sierra "multiples larger than the next biggest" and investing aggressively to extend that lead.

Details & Specifications

Training/Architecture:

Agent SDK as the core developer tool; prepackaged agent skills for common customer workflows; Ghostwriter — an agent that creates and optimizes other agents from natural language descriptions of desired outcomes; no-code and full-code paths; model-agnostic underlying architecture

Models/Checkpoints:

Platform-level product, not a single model; runs on top of frontier models; specific underlying models not publicly disclosed; multi-model routing supported

Performance:

Billions of customer interactions handled annually; 70%+ autonomous resolution rates (Singtel deployment); $150M ARR in 8 quarters; 40%+ Fortune 50 penetration; agents deployed across mortgage refinancing, insurance claims, returns processing, nonprofit fundraising

Pricing:

Enterprise SaaS pricing; not publicly disclosed; $150M ARR across customer base implies significant per-seat or per-interaction pricing at scale

Integration/Availability:

Available via Agent SDK; GV, Tiger Global, Benchmark, Sequoia, Greenoaks among investors; $350M previous round raised 8 months ago; IPO described as "definitely in our future" but no timeline given

Anthropic in Talks to Buy Chips From Fractile — SRAM-Based Inference at a Fraction of GPU Cost

Anthropic has reportedly held early discussions with London-based chip startup Fractile about purchasing its inference accelerators, which would add Fractile as a fourth source of AI server silicon alongside Nvidia, Google, and Amazon — driven by surging Claude demand straining its current infrastructure. The core architectural bet Fractile is making is memory-compute fusion: unlike standard chips that constantly shuttle data between the processor and separate DRAM memory modules, Fractile's design keeps data directly on the chip using SRAM, which does not need to be refreshed — a method the company claims can run large language models up to 100x faster while lowering operational costs by 90%. The timing makes the talks strategically urgent: Anthropic's annualized revenue surged from roughly $9 billion at the end of 2025 to over $30 billion as of April 2026, with inference costs — not training — being the primary margin squeeze, a pain point shared with OpenAI and every other lab running frontier models at scale. The talks remain exploratory, with Fractile's chips not expected to reach commercial readiness until around 2027, placing any deployment well outside Anthropic's near-term procurement plans.

Details & Specifications

Training/Architecture:

SRAM-based in-memory compute — calculations run directly in static RAM on-die, eliminating the DRAM data-movement bottleneck that limits GPU inference throughput; co-locates memory and compute on the same die; custom software stack built alongside the hardware

Models/Checkpoints:

No production chip yet; commercial readiness projected ~2027; team includes engineers from Graphcore, Nvidia, and Imagination Technologies; founded 2022 by Oxford PhD Walter Goodwin

Performance:

Fractile claims up to 100x faster LLM inference vs conventional GPUs at 90% lower operating cost — vendor-claimed, not independently verified at scale; SRAM architecture eliminates the memory bandwidth wall that throttles GPU inference at long context

Pricing:

Fractile raised $15M seed from Kindred Capital, NATO Innovation Fund, and Oxford Science Enterprises; now in talks to raise $200M at a $1B+ valuation with Founders Fund, 8VC, and Accel among potential investors; chip pricing not disclosed

Integration/Availability:

Not commercially available; targeted ~2027 readiness; Anthropic talks in early stage with no volume or delivery commitments defined; Fractile has committed £100M to expand engineering in London and Bristol over three years; potential fourth Anthropic supplier track alongside Nvidia, Google TPUs, and AWS Trainium

3. AI CAREER OPPORTUNITIES

1. AI Model Assessment Specialist
📍 Micro1 | Remote US | Full-Time
🔗 Apply Here

2. AI Ethics Reviewer
📍 Alignerr | Remote | Full-Time
🔗 Apply Here

3. Lead AI / ML Engineer
📍 MeetRecord | Remote | Full-Time
🔗 Apply Here

4. AI Engineer
📍 aiAvenu, inc | Remote | Full-Time
🔗 Apply Here

We track real AI shifts - with facts, without hype
•⁠ ⁠The most important daily AI advancements, summarized & with tech specs
•⁠ ⁠One critical AI concept explained in simple terms
•⁠ ⁠Curated AI jobs and projects, all remote-friendly
To make the most of AI, subscribe to the newsletter and share it with other AI professionals.

Stay connected:
Deep Tech Stars Web/App | WhatsApp: AI Jobs | WhatsApp: AI Discussions | LinkedIn | Instagram

Deep Tech Stars

Discussion about this post

Ready for more?