Formal verification, explained — plus the AI that proved decade-old math problems for $200 each

And top remote AI jobs from Tether , grail.computer and more.

May 25, 2026

Formal verification is a mathematical technique that proves a system is correct — not "probably correct based on testing," but provably, unconditionally correct, the same way 2+2=4 is correct. In traditional software, you test for bugs; in formally verified software, you prove they cannot exist. It's the technology sitting at the heart of AlphaProof Nexus cracking decade-old math problems that human experts couldn't solve — because when Lean says a proof is valid, it is valid with mathematical certainty, not the hallucinated confidence of a language model.

Let's understand the concept a little better first.

1. AI CONCEPT IN 3 MINUTES

Formal Verification

Formal verification is the process of using mathematical logic to prove that a system — a piece of software, a hardware circuit, a mathematical proof — is correct, with absolute certainty rather than probabilistic confidence. Think of it as the difference between testing a bridge by driving trucks over it until it breaks, versus proving with mathematics that it cannot break under any load below a given threshold.

How It Works

The Specification: First, you write a formal specification — a precise mathematical description of what the system is supposed to do. This is harder than it sounds: translating “this smart contract should not allow double-spending” into a logical formula requires rigor that natural language descriptions don’t enforce.

The Proof Assistant: A tool like Lean, Coq, or Isabelle acts as a judge. You write a proof — a sequence of logical steps leading from axioms to the conclusion — and the proof assistant checks every single step against its formal rule system. If any step is invalid, even by a hair, the entire proof is rejected. There is no “close enough.”

Why LLMs Alone Fail at This: A language model generates plausible-looking proofs. Plausible is not the same as correct — a model can hallucinate a logical step that sounds valid in natural language but violates a formal axiom. This is the core reason AlphaProof Nexus pairs an LLM with Lean: the LLM proposes, Lean verifies. The LLM’s creativity is channeled by the verifier’s rigor.

Why It Matters Beyond Math: Formal verification is the gold standard in cryptographic protocol design (TLS, zero-knowledge proofs), smart contract auditing, and safety-critical hardware (Intel processors, aerospace software). A formally verified zero-knowledge proof doesn’t have edge cases — it has a mathematical proof that edge cases don’t exist. The cost of getting this wrong in any of these domains is not a failed test suite; it’s a billion-dollar hack or a plane crash.

Example

Without formal verification — a smart contract:

Solidity: "Only the owner can withdraw funds"
Developer tests: 15 test cases, all pass
Audit: firm reviews code — looks correct
Deploy: $200M in funds stored
Reality: edge case in reentrancy guard allows attacker to drain contract
Cost: $200M lost in 3 hours

With formal verification — same contract:

Specification: ∀ caller ≠ owner: withdraw(caller) → REVERT
Proof: every execution path from withdraw() where
       caller ≠ owner reaches REVERT — verified in Lean
Deploy: $200M in funds stored
Attack: attempted reentrancy → impossible by proof
Cost: $0

Formal verification doesn’t find bugs — it proves they cannot exist, which is a fundamentally different and much stronger guarantee than any amount of testing can provide.

2. TOP 3 DEVELOPMENTS

Google DeepMind's AlphaProof Nexus Solves 9 Erdős Problems — With Mathematical Certainty, For $200 Each

Google DeepMind published a preprint on May 21, 2026 documenting AlphaProof Nexus — a system that fuses large language models with the Lean formal proof assistant — autonomously cracking 9 of 353 open Erdős problems and proving 44 of 492 open OEIS conjectures, at a per-problem cost of a few hundred dollars, for problems that had in some cases gone unsolved for over 56 years. The architectural insight is the concept explainer brought to life: AlphaProof Nexus pairs LLM generation with Lean's formal compiler in a tight feedback loop — the LLM proposes a proof path, Lean checks every logical step and rejects anything that doesn't hold, the rejection is fed back as a signal, and the agent tries again — eliminating hallucination not by making the model less likely to hallucinate, but by making hallucinations mechanically impossible to pass through the verifier. The most striking finding in the paper is also the most useful for developers thinking about agent design: a basic agent that simply alternates LLM generation with compiler feedback replicated all 9 Erdős successes — the full-featured system with evolutionary search and reinforcement learning only provided meaningful advantages on the hardest problems, suggesting that as foundation models improve, simple agentic loops with formal verification are catching up to complex specialized architectures. Beyond the benchmark results, AlphaProof Nexus resolved a 15-year-old open question in algebraic geometry and discovered a novel algorithmic parameter in optimization theory that humans hadn't found — and all proofs have been published on GitHub and logged on Terence Tao's wiki on AI contributions to Erdős problems.

Details & Specifications

Training/Architecture:

Hybrid LLM + formal verifier pipeline; four agent configurations tested (A–D) ranging from simple LLM-Lean alternation to full evolutionary search + AlphaProof + reinforcement learning; "full-featured" agent (D) used for Erdős and OEIS evaluation runs; LLM subagents propose proof paths; Lean acts as formal verifier checking and rejecting each logical step; 3,000 episodes per problem maximum before termination; all Lean statements from open-source Formal Conjectures repository; builds directly on AlphaProof (2024 IMO silver-medal system)

Models/Checkpoints:

AlphaProof Nexus (research system, not commercially available); powered by Gemini (specific variant not disclosed in preprint); all Lean proofs and selected natural language versions published at github.com/google-deepmind/alphaproof-nexus-results; preprint: arXiv 2605.22763v1

Performance:

9/353 open Erdős problems solved (agent ran on all 353 formalized problems — selection not cherry-picked); 44/492 open OEIS conjectures proved (~9%); domains: combinatorics, graph theory, algebraic geometry, optimization theory, quantum optics, additive combinatorics; resolved 15-year-old algebraic geometry question; discovered novel algorithmic parameter in optimization theory; results logged on Terence Tao's Erdős wiki; per-problem cost: a few hundred dollars

OpenAI Creates $445,000 Safety Role to Track Whether AI Is Automating Its Own Developers

OpenAI posted a Preparedness team research position paying $295,000–$445,000 for a researcher to study recursive self-improvement — defined as an AI system's ability to research, design, and train better versions of itself without meaningful human involvement — with the job listing explicitly noting the role "relies on reasoning about problems that might exist in the future, but might not exist now." The responsibilities reveal exactly what OpenAI is measuring internally: the researcher will track progress toward automation of technical staff — effectively measuring how much of OpenAI's own engineering work AI is already doing — implement data poisoning defenses, and design experiments to detect whether AI models may be developing dangerous capabilities ahead of corresponding safety measures. The "tasteful and strategic" phrasing in the listing has attracted wide attention; the posting explains it directly: because the work involves reasoning about uncertain, forward-looking risks, the researcher must navigate internal politics around a subject that could attract regulatory and media scrutiny, making judgment as important as technical skill. The external timeline context makes this hire urgent: OpenAI CEO Sam Altman has set a public benchmark of an "automated AI research intern" by September 2026 and a "true automated AI researcher by March 2028," while DeepMind CEO Demis Hassabis has publicly described humanity as being "in the foothills of the singularity" and predicted human-level AI within five years capable of sparking a transformation ten times faster than the industrial revolution.

Details & Specifications

Role details:

Salary: $295,000–$445,000; team: Preparedness (OpenAI's frontier risk team); title: Research Scientist, Preparedness; location: San Francisco; seniority: senior ML engineer level compensation

Core responsibilities:

Modeling abstract, long-horizon security risks from recursive self-improvement; defending models against data poisoning attacks; building interpretability tools to audit model activations and internal reasoning; designing experiments to detect misalignment or dangerous capability development; tracking internal metrics on automation of technical staff — measuring how much of OpenAI's own R&D AI is performing

Industry timeline context:

Sam Altman: automated AI research intern by September 2026, true automated AI researcher by March 2028; Demis Hassabis: human-level AI within 5 years; Jack Clark (Anthropic): 60%+ odds AI trains its own successors by end-2028; METR data: AI time-horizon at 12 hours now, 100-hour runs projected by end-2026

Jeff Bezos Reveals Project Prometheus Is Building an "Artificial General Engineer" — Not Robots

In a live CNBC interview on May 20, 2026, Jeff Bezos offered the first substantive public description of Project Prometheus — correcting widespread speculation that the company was building humanoid robots or factory automation — and clarifying that it is developing what he called an "artificial general engineer": a next-generation AI system for designing physical objects, which he described as "a very, very modern version of CAD," while adding that even that framing undersells the company's actual ambitions. The distinction Bezos is drawing matters for developers: traditional CAD software lets a human engineer sketch a design and checks whether it violates physical constraints; an artificial general engineer would close the loop entirely — generating, optimizing, and formally verifying designs autonomously, without a human steering each step. Project Prometheus was founded in November 2025 by Bezos and former Google X scientist Vikram Bajaj, operates from San Francisco, London, and Zurich with approximately 120 employees recruited from OpenAI, DeepMind, Meta, and xAI, and has raised $16.2 billion total — a $6.2 billion launch round followed by a $10 billion round anchored by JPMorgan and BlackRock at a $38 billion valuation. The broader Bezos ambition disclosed separately but in the same period: Bezos is also pursuing up to $100 billion in capital to acquire manufacturing firms across chipmaking, aerospace, and defense sectors and automate them with AI — with Prometheus's engineering AI as the software layer sitting on top of acquired physical production capacity. No product has been shipped, no model has been demonstrated publicly, and Bezos himself described the company as still "operating quietly behind the scenes."

Details & Specifications

Training/Architecture (/goal):

No public technical details; described as AI for designing physical objects, going far beyond current CAD; Bezos described it as a "very, very modern version of CAD" but stressed this undersells actual scope; team hires from OpenAI, DeepMind, Meta, xAI suggest frontier model training capability; co-CEO Vik Bajaj background: chemist, physicist, Google X, Foresite Labs (AI biotech incubator) — pointing toward physics-grounded world models for engineering simulation

Models/Checkpoints:

No models released; no products shipped; stealth operation; all current activity is research and hiring; no public release timeline given

Funding:

$6.2 billion Series A (launch, November 2025, led by Bezos); $10 billion Series B (April 2026, anchored by JPMorgan and BlackRock, no lead investor); $38 billion post-money valuation; total committed capital: $16.2 billion; additional $100 billion manufacturing acquisition fund reportedly being pursued separately

3. AI CAREER OPPORTUNITIES

1. AI Research Engineer (Model Compression & Quantization)
📍 Tether | Remote  | Full-Time
🔗 Apply Here

2. Founding Engineer, Voice AI
📍 Kosha AI | Remote | Full-Time
🔗 Apply Here

3. AI Forward Deployed Engineer (Remote)
📍 grail.computer | Remote | Full-Time
🔗 Apply Here

4. AI-Native Software Engineer
📍 Presolv360 | Remote | Full-Time
🔗 Apply Here

We track real AI shifts - with facts, without hype
•⁠ ⁠The most important daily AI advancements, summarized & with tech specs
•⁠ ⁠One critical AI concept explained in simple terms
•⁠ ⁠Curated AI jobs and projects, all remote-friendly
To make the most of AI, subscribe to the newsletter and share it with other AI professionals.

Stay connected:
Deep Tech Stars Web/App | WhatsApp: AI Jobs | WhatsApp: AI Discussions | LinkedIn | Instagram

Deep Tech Stars

Discussion about this post

Ready for more?