ForgeGod official mascot

FORGEGOD

A research-backed coding harness for serious software work. Current-source grounding, native audit preflight, explicit review, specialist agents, and local or cloud execution.

Click to copy $ pip install forgegod
23Built-in Tools
9Provider Families
5-TierCognitive Memory
24/7Autonomous
$0Local Mode
forgegod
$

CORE CAPABILITIES

Many coding agents still rely on stale model memory or a single unchecked pass. ForgeGod adds current web research, adversarial review, bounded automation, optional human-readable memory projection, and first-party bridges into external chat runtimes.

Native Hermes and OpenClaw Integration

ForgeGod already exposes a first-party bridge for external chat runtimes. Export Hermes and OpenClaw adapters, keep stable session IDs, and route Telegram, WhatsApp, Discord, or thread-based conversations straight into ForgeGod instead of maintaining fragile prompt glue outside the harness.

Hermes OpenClaw Telegram · WhatsApp · Discord Native runtime delegation
Export adapters
forgegod integrations export-hermes-skill --output ./dist/hermes forgegod integrations export-openclaw-skill --output ./dist/openclaw
Bridge runtime
forgegod bridge chat --runtime hermes --session-id wa-42 "Continue the checkout fix"
Machine-friendly JSON or text responses, persistent session continuity, and repo-local context reuse are already live in the CLI.
🔎

Researches Before It Codes

The built-in research agent gathers current docs, library guidance, release notes, and security context before code-changing tasks, then retries with targeted troubleshooting if a bad review or stuck turn blocks progress.

ResearchAgent in runtime

Adversarial Plan Review

A second model attacks the plan like a hostile reviewer. Security gaps, deprecated deps, missing edge cases, and weak abstractions surface before coding starts. Actor-critic review is a durable agent pattern because it catches issues single-pass generation misses.

3 rounds, 6 dimensions
🧠

Remembers and Learns From Every Task

Most AI tools forget everything the moment you close them. ForgeGod remembers. It stores what happened on each task, what principles work best, which code patterns succeed, how things connect to each other, and which errors it has solved before. Old memories fade, new ones get stronger. Runtime recall stays in SQLite, and stable memories can also be projected into an optional Obsidian vault.

5 tiers + optional vault projection
🤖

Dedicated Memory Agent

After every task, a specialized AI analyzes what happened and extracts reusable learnings into 5 structured memory tiers. Not heuristic pattern matching - an actual LLM that understands context, identifies principles, catalogs error fixes, and maps causal relationships. It can also export clean human-readable summaries into an optional Obsidian vault.

LLM-powered extraction + vault summaries

Picks the Best AI for Each Job

Connects to 9 provider families and 10 route surfaces, including native OpenAI Codex subscription auth, OpenAI API, Z.AI, and MiniMax. You can run the recommended adversarial split harness, force a single-model profile, or choose explicit OpenAI surfaces: auto, api-only, codex-only, or api+codex.

9 provider families + explicit OpenAI surfaces

Structured Loops and Hive Execution

Run work sequentially through the Ralph loop, dispatch ready stories through the local hive with isolated worktrees, or prepend bounded subagent analysis before coding. Loop and hive can refresh native audit artifacts first, so blocked repos stop before planning instead of after damage.

Audit-gated loop + hive + subagents
📈

Checks Its Own Work

Every piece of code gets up to 3 attempts. Starts with free local AI, escalates to smarter cloud AI if needed. Validates correctness at every step before moving on.

3 attempts, escalating quality
💰

You Control the Spending

Set a daily budget and ForgeGod respects it. As you approach your limit, it shifts to cheaper models. Hit the limit? It switches to free local AI or pauses. Run 100% free forever.

Never an unexpected bill

Self-Improving Harness

ForgeGod records outcomes, learns from them, and refines prompts, routing, and execution strategy under explicit safety policy. Improvement is part of the harness, not an undocumented habit.

Outcome-driven refinement
🦬

Terse Mode

Use shorter internal instructions when you want lower token overhead without changing the core workflow. Available across chat, run, plan, and loop surfaces via --terse.

Lower token overhead
🛡

Built-In Security

Workspace-scoped file access, prompt-injection scanning, secret redaction, and two execution tiers. Standard mode stays local with guardrails. Strict mode runs commands in a real Docker sandbox with no network and blocks if Docker or the required image is missing. `forgegod doctor` now checks those prerequisites directly.

Real strict sandbox
🔌

MCP Tool Server

Connect any external tool via the Model Context Protocol. Databases, APIs, Slack, GitHub - ForgeGod can use them all mid-task. Extend its capabilities without forking the code.

Infinitely extensible
📚

Loadable Skills

Pre-built instruction sets for common tasks - Django, FastAPI, React, testing, deployment. Load a skill and the agent follows battle-tested patterns instead of improvising.

Reusable expertise
🎨

Native Audit and Specialist Agents

audit-agent is wired into ForgeGod as a native preflight gate. `forgegod audit`, `forgegod loop`, and `forgegod hive` can refresh audit artifacts, halt on blockers, and expose security, architecture, and plan-risk specialist passes before planning continues. ResearchAgent grounds implementation with current-source briefs. TasteAgent adds optional design review after code review. EffortGate blocks one-pass "good enough" exits without verification.

Native audit + grounding + taste + max-effort
💬

Hermes and OpenClaw Runtime Bridges

ForgeGod now exposes a machine-friendly bridge for external chat runtimes. Export first-party Hermes and OpenClaw adapters, keep stable session IDs, and let Telegram, WhatsApp, Discord, or thread-bound conversations delegate repo work into ForgeGod without prompt-only glue.

External chat runtimes -> ForgeGod

ENGINEERING MOATS

Six engineering decisions that define the current ForgeGod harness.

MOAT 01

Research-Gated Changes

2026 policy: research before meaningful changes

ForgeGod now treats deep 2026 research as part of the repo contract for architecture, dependency, security, benchmark, workflow, and public-claim changes. That policy is also wired into runtime for code-changing work and bad-review recovery, so the harness does not just document the rule - it executes it.

MOAT 02

Two AIs Argue First

Catches common mistakes before they compound

A hostile reviewer AI attacks your plan before coding starts. It scores 6 dimensions: SOTA-ness, security, architecture, completeness, dependencies, and novelty.

MOAT 03

24/7 Autonomous Loop

Fresh context per task - zero context rot

Give it a PRD with 10 stories. Walk away. ForgeGod can execute them through the sequential Ralph loop or the local hive coordinator. Each story gets a clean agent and isolated worktree, so there is no context rot and parallel work does not collide in one shared workspace.

MOAT 04

$0 Mode That Works

Most competitors require paid API keys or subscriptions

Every competitor needs API keys or subscriptions. ForgeGod runs 100% free with Ollama. Qwen 3.5 9B fits in 8GB VRAM. When you need cloud power, it auto-escalates - then drops back to free.

MOAT 05

5-Tier Cognitive Memory

SQLite-backed memory with consolidation and recall

ForgeGod keeps task history, lessons learned, code patterns, causal graph edges, and error solutions in a structured SQLite-backed stack. A dedicated MemoryAgent extracts learnings after every task, and optional Obsidian projection keeps a readable human surface without replacing runtime retrieval.

MOAT 06

Budget That Never Surprises

Explicit budgets, downgrade paths, and local fallback

4 budget modes: Normal, Throttle, Local-Only, Halt. Auto-downgrades at 80% of your daily limit. Uses free local models for bulk work, cloud only when needed. Real-time cost tracking per model, per role, per task.

RECOMMENDED DEPLOYMENT PROFILES

ForgeGod supports distinct operating profiles depending on cost, latency, and review requirements. Start with the smallest setup that matches the stakes, then scale the harness when you need more throughput or stronger review.

FREE

Ollama Only

$0

Great for learning and personal projects. One AI on your own computer does everything. No account needed.

forgegod init --quick 1 model
TEAM / HIGH STAKES

Cloud + Local + Cloud

~$0.50 per task

Separate planning, coding, and review across multiple models when throughput, traceability, and stronger review matter more than raw cost.

forgegod init 3+ models

Research consistently shows that scaffolding, tool access, and multi-step review materially change coding-agent performance. ForgeGod uses role-aware routing instead of forcing one model to do every job, and it documents the current OpenAI and GLM/Codex operating surfaces publicly. See the OpenAI surface contract and the GLM + Codex harness notes for the current recommended setups.

5-TIER COGNITIVE MEMORY

One of the deepest memory stacks in an open coding agent. ForgeGod keeps 5 structured layers in SQLite for runtime recall, then projects stable summaries into an optional Obsidian vault when you want a readable knowledge workspace.

📚

Task History

Remembers every task it has worked on: what it did, whether it succeeded, and what files it changed. Keeps 90 days of history.

💡

Lessons Learned

Extracts general rules from experience, like "writing tests first leads to fewer bugs." Rules it keeps seeing get stronger. Rules that stop being true fade away.

How-To Recipes

Stores step-by-step patterns that worked before - like recipes for common coding tasks. Tracks which recipes succeed most often.

🔗

Connections

Maps how files, functions, and concepts relate to each other. Understands cause and effect - like "changing file A usually requires updating file B."

🔧

Error Fixes

When it solves an error, it remembers the fix. Next time the same error appears, it already knows what to do - no wasted time re-discovering the solution.

The learning cycle - every task makes ForgeGod smarter:
Do a Task → Remember What Happened → Extract Lessons → Clean Up & Merge Memories → Recall What Matters → Do the Next Task Better
$ forgegod memory
Memory Health: HEALTHY
Episodic:   47 episodes (last 90 days)
Semantic:   128 principles (23 strong, 89 moderate, 16 tentative)
Procedural: 34 code patterns (avg 87% success rate)
Graph:      256 entities, 89 causal edges
Errors:     19 solutions indexed

Top learnings:
 [STRONG] Write tests before implementation (evidence: 12x)
 [STRONG] Guard clauses prevent deep nesting (evidence: 8x)
 [STRONG] Type hints reduce reflexion rounds by 40% (evidence: 6x)

HOW IT WORKS

1

Describe

Tell ForgeGod what you want to build. It can research current tools, gather docs, and refine the plan before coding when freshness matters.

2

Build

ForgeGod works through each task on its own - writing code, testing it, saving its work, and moving to the next one. It runs around the clock until every task is done.

3

Review

Verified work is diffed and reviewed before it is finalized. Commits are under user control rather than forced automatically by the loop.

BENCHMARKS & RESEARCH

How Much the Tools Matter (SWE-bench)

AI alone, no tools45.9%
+ Basic helper tools51.8%
+ Smart context tools55.4%
+ Full agent (like ForgeGod)57.0%

Two AIs vs. One AI (Aider Polyglot)

GPT-4o (alone)49.0%
Sonnet (alone)53.8%
R1 + Sonnet (pair)64.0%
GPT-5 (best single)88.0%

Cost per Task (Typical)

Free local AI only$0.00
Free local + cloud review~$0.05
Full cloud setup~$0.50
Premium AI only~$2.00
Estimated from public provider pricing (Apr 2026). Subscription-auth surfaces are costed as $0 inside ForgeGod because billing happens outside repo telemetry.

AI Coding Leaderboard (Apr 2026)

Claude Sonnet 4.61062
Gemini 2.5 Pro1043
GPT-51038
Gemini 3 Flash780

Engine Snapshot (Historical)

Router throughputHigh-throughput async engine
Security scans/sec72K + AST
File I/O (async + atomic)1,416/sec
Memory recall (FTS5 + RRF)10K indexed
Cost tracking + forecasting199 w/sec
Historical snapshot from 2026-04-05. Current audited baseline: 654 collected tests, 84/84 stress tests passing, lint passing, and build passing. Historical methodologyAudit

Reliability

Security detection (regex + AST)100%
Budget enforcement + forecasting100%
Fallback chain + cascade routing100%
Circuit breaker (half-open + sliding)100%
Parallel loop + local hive100%
Current verified baseline: 667 collected tests, 582 non-stress tests passing plus 1 skipped by default, 84/84 stress tests passing, lint green, bytecode compilation green, package build green, and live CLI smoke checks for forgegod, forgegod run, forgegod hive, forgegod audit, and forgegod obsidian. Opt-in --subagents paths are covered in the targeted suite, and the OpenAI surface matrices still support API-only, Codex-only, or hybrid API+Codex routing. Benchmark bars above remain historical until regenerated from the repaired runner. Audit details

The bottom line: serious coding agents are defined as much by harness design as by model choice. ForgeGod's routing, review, memory, and execution controls are the product surface.
Full benchmark analysisSWE-bench dataAider leaderboards

LIVE SHIP LOG

Near-real-time commit feed from main. It auto-refreshes every two minutes, caches the last good result in the browser, and falls back cleanly if GitHub rate-limits the request.

Branch main
Refresh 2 min
Source GitHub REST
Loading latest commits...
  • Loading latest commits...

Pulled from the public GitHub commits API for waitdeadai/forgegod. Public unauthenticated requests are limited, so the widget reuses a cached feed if GitHub temporarily says no.

REFERENCE APP

A public ForgeGod-built application used to exercise the harness across planning, implementation, review, and mobile UI work. It remains a live reference deployment, not a marketing mockup.

ForgeGod Reference Deployment
LIVE
Reference status
1st
Public deployment
UX
Mobile-ready pass
NEXT
Active iteration

HOW WE COMPARE

Swipe horizontally to compare.

Claude Code Codex CLI Aider Cursor ForgeGod
Supports web research before coding - - - - Recon
Adversarial plan review - - - - 3 rounds
2026 research-gated changes - - - -
Picks the best AI automatically - - manual -
Free local + paid cloud together - basic basic - native
Runs 24/7 on its own - - - -
Remembers across sessions flat files AGENTS.md - removed v2.1 5-tier SQLite
Auto memory cleanup AutoDream - - - AutoDream+
Improves itself over time - - - - SICA
Spending controls - - - - 4 modes
Retries with smarter AI - - - - 3-attempt
Runs 100% free locally - - manual - native
Stress tested + benchmarked - - - - 34 tests
Open source partial Apache 2.0 Apache 2.0 - Apache 2.0

SUPPORTED PROVIDERS

🏠
Ollama
$0 / FREE
OpenAI
API + Codex subscription
Anthropic
Gemini
🌐
OpenRouter
DeepSeek
22x cheaper
Kimi
Moonshot direct
Z.AI
GLM-5.1 / Coding Plan
MiniMax
M2 / M2.7 · OpenAI-compatible

GET STARTED IN 30 SECONDS

pip install forgegod

Then: forgegod → describe your task

Python 3.11+ • Auto-bootstraps local config on first chat • Add forgegod --terse when you want lower token overhead • Run forgegod evals, forgegod evals --matrix openai-surfaces, forgegod evals --matrix openai-live, or forgegod evals --matrix openai-live-compare or forgegod evals --matrix minimax-live-compare to verify the harness itself • Inspect OpenAI surfaces with forgegod auth explain before syncing defaults • Works on macOS, Linux, Windows • Full docs on GitHub