A research-backed coding harness for serious software work. Current-source grounding, native audit preflight, explicit review, specialist agents, and local or cloud execution.
Many coding agents still rely on stale model memory or a single unchecked pass. ForgeGod adds current web research, adversarial review, bounded automation, optional human-readable memory projection, and first-party bridges into external chat runtimes.
ForgeGod already exposes a first-party bridge for external chat runtimes. Export Hermes and OpenClaw adapters, keep stable session IDs, and route Telegram, WhatsApp, Discord, or thread-based conversations straight into ForgeGod instead of maintaining fragile prompt glue outside the harness.
forgegod integrations export-hermes-skill --output ./dist/hermes
forgegod integrations export-openclaw-skill --output ./dist/openclaw
forgegod bridge chat --runtime hermes --session-id wa-42 "Continue the checkout fix"
The built-in research agent gathers current docs, library guidance, release notes, and security context before code-changing tasks, then retries with targeted troubleshooting if a bad review or stuck turn blocks progress.
ResearchAgent in runtimeA second model attacks the plan like a hostile reviewer. Security gaps, deprecated deps, missing edge cases, and weak abstractions surface before coding starts. Actor-critic review is a durable agent pattern because it catches issues single-pass generation misses.
3 rounds, 6 dimensionsMost AI tools forget everything the moment you close them. ForgeGod remembers. It stores what happened on each task, what principles work best, which code patterns succeed, how things connect to each other, and which errors it has solved before. Old memories fade, new ones get stronger. Runtime recall stays in SQLite, and stable memories can also be projected into an optional Obsidian vault.
5 tiers + optional vault projectionAfter every task, a specialized AI analyzes what happened and extracts reusable learnings into 5 structured memory tiers. Not heuristic pattern matching - an actual LLM that understands context, identifies principles, catalogs error fixes, and maps causal relationships. It can also export clean human-readable summaries into an optional Obsidian vault.
LLM-powered extraction + vault summariesConnects to 9 provider families and 10 route surfaces, including native OpenAI Codex subscription auth, OpenAI API, Z.AI, and MiniMax. You can run the recommended adversarial split harness, force a single-model profile, or choose explicit OpenAI surfaces: auto, api-only, codex-only, or api+codex.
9 provider families + explicit OpenAI surfacesRun work sequentially through the Ralph loop, dispatch ready stories through the local hive with isolated worktrees, or prepend bounded subagent analysis before coding. Loop and hive can refresh native audit artifacts first, so blocked repos stop before planning instead of after damage.
Audit-gated loop + hive + subagentsEvery piece of code gets up to 3 attempts. Starts with free local AI, escalates to smarter cloud AI if needed. Validates correctness at every step before moving on.
3 attempts, escalating qualitySet a daily budget and ForgeGod respects it. As you approach your limit, it shifts to cheaper models. Hit the limit? It switches to free local AI or pauses. Run 100% free forever.
Never an unexpected billForgeGod records outcomes, learns from them, and refines prompts, routing, and execution strategy under explicit safety policy. Improvement is part of the harness, not an undocumented habit.
Outcome-driven refinementUse shorter internal instructions when you want lower token overhead without changing the core workflow. Available across chat, run, plan, and loop surfaces via --terse.
Workspace-scoped file access, prompt-injection scanning, secret redaction, and two execution tiers. Standard mode stays local with guardrails. Strict mode runs commands in a real Docker sandbox with no network and blocks if Docker or the required image is missing. `forgegod doctor` now checks those prerequisites directly.
Real strict sandboxConnect any external tool via the Model Context Protocol. Databases, APIs, Slack, GitHub - ForgeGod can use them all mid-task. Extend its capabilities without forking the code.
Infinitely extensiblePre-built instruction sets for common tasks - Django, FastAPI, React, testing, deployment. Load a skill and the agent follows battle-tested patterns instead of improvising.
Reusable expertiseaudit-agent is wired into ForgeGod as a native preflight gate. `forgegod audit`, `forgegod loop`, and `forgegod hive` can refresh audit artifacts, halt on blockers, and expose security, architecture, and plan-risk specialist passes before planning continues. ResearchAgent grounds implementation with current-source briefs. TasteAgent adds optional design review after code review. EffortGate blocks one-pass "good enough" exits without verification.
Native audit + grounding + taste + max-effortForgeGod now exposes a machine-friendly bridge for external chat runtimes. Export first-party Hermes and OpenClaw adapters, keep stable session IDs, and let Telegram, WhatsApp, Discord, or thread-bound conversations delegate repo work into ForgeGod without prompt-only glue.
External chat runtimes -> ForgeGodSix engineering decisions that define the current ForgeGod harness.
ForgeGod now treats deep 2026 research as part of the repo contract for architecture, dependency, security, benchmark, workflow, and public-claim changes. That policy is also wired into runtime for code-changing work and bad-review recovery, so the harness does not just document the rule - it executes it.
A hostile reviewer AI attacks your plan before coding starts. It scores 6 dimensions: SOTA-ness, security, architecture, completeness, dependencies, and novelty.
Give it a PRD with 10 stories. Walk away. ForgeGod can execute them through the sequential Ralph loop or the local hive coordinator. Each story gets a clean agent and isolated worktree, so there is no context rot and parallel work does not collide in one shared workspace.
Every competitor needs API keys or subscriptions. ForgeGod runs 100% free with Ollama. Qwen 3.5 9B fits in 8GB VRAM. When you need cloud power, it auto-escalates - then drops back to free.
ForgeGod keeps task history, lessons learned, code patterns, causal graph edges, and error solutions in a structured SQLite-backed stack. A dedicated MemoryAgent extracts learnings after every task, and optional Obsidian projection keeps a readable human surface without replacing runtime retrieval.
4 budget modes: Normal, Throttle, Local-Only, Halt. Auto-downgrades at 80% of your daily limit. Uses free local models for bulk work, cloud only when needed. Real-time cost tracking per model, per role, per task.
ForgeGod supports distinct operating profiles depending on cost, latency, and review requirements. Start with the smallest setup that matches the stakes, then scale the harness when you need more throughput or stronger review.
$0
Great for learning and personal projects. One AI on your own computer does everything. No account needed.
forgegod init --quick
1 model
~$0.05 per task
A practical default for most repositories: local coding or low-cost execution with cloud review or escalation when correctness matters.
forgegod init
2 models - recommended
~$0.50 per task
Separate planning, coding, and review across multiple models when throughput, traceability, and stronger review matter more than raw cost.
forgegod init
3+ models
Research consistently shows that scaffolding, tool access, and multi-step review materially change coding-agent performance. ForgeGod uses role-aware routing instead of forcing one model to do every job, and it documents the current OpenAI and GLM/Codex operating surfaces publicly. See the OpenAI surface contract and the GLM + Codex harness notes for the current recommended setups.
One of the deepest memory stacks in an open coding agent. ForgeGod keeps 5 structured layers in SQLite for runtime recall, then projects stable summaries into an optional Obsidian vault when you want a readable knowledge workspace.
Remembers every task it has worked on: what it did, whether it succeeded, and what files it changed. Keeps 90 days of history.
Extracts general rules from experience, like "writing tests first leads to fewer bugs." Rules it keeps seeing get stronger. Rules that stop being true fade away.
Stores step-by-step patterns that worked before - like recipes for common coding tasks. Tracks which recipes succeed most often.
Maps how files, functions, and concepts relate to each other. Understands cause and effect - like "changing file A usually requires updating file B."
When it solves an error, it remembers the fix. Next time the same error appears, it already knows what to do - no wasted time re-discovering the solution.
Tell ForgeGod what you want to build. It can research current tools, gather docs, and refine the plan before coding when freshness matters.
ForgeGod works through each task on its own - writing code, testing it, saving its work, and moving to the next one. It runs around the clock until every task is done.
Verified work is diffed and reviewed before it is finalized. Commits are under user control rather than forced automatically by the loop.
forgegod, forgegod run, forgegod hive, forgegod audit, and forgegod obsidian. Opt-in --subagents paths are covered in the targeted suite, and the OpenAI surface matrices still support API-only, Codex-only, or hybrid API+Codex routing. Benchmark bars above remain historical until regenerated from the repaired runner. Audit details
The bottom line: serious coding agents are defined as much by harness design as by model choice. ForgeGod's routing, review, memory, and execution controls are the product surface.
Full benchmark analysis •
SWE-bench data •
Aider leaderboards
Near-real-time commit feed from main. It auto-refreshes every two minutes, caches the last good result in the browser, and falls back cleanly if GitHub rate-limits the request.
Pulled from the public GitHub commits API for waitdeadai/forgegod. Public unauthenticated requests are limited, so the widget reuses a cached feed if GitHub temporarily says no.
A public ForgeGod-built application used to exercise the harness across planning, implementation, review, and mobile UI work. It remains a live reference deployment, not a marketing mockup.
Swipe horizontally to compare.
| Claude Code | Codex CLI | Aider | Cursor | ForgeGod | |
|---|---|---|---|---|---|
| Supports web research before coding | - | - | - | - | Recon |
| Adversarial plan review | - | - | - | - | 3 rounds |
| 2026 research-gated changes | - | - | - | - | ✓ |
| Picks the best AI automatically | - | - | manual | - | ✓ |
| Free local + paid cloud together | - | basic | basic | - | native |
| Runs 24/7 on its own | - | - | - | - | ✓ |
| Remembers across sessions | flat files | AGENTS.md | - | removed v2.1 | 5-tier SQLite |
| Auto memory cleanup | AutoDream | - | - | - | AutoDream+ |
| Improves itself over time | - | - | - | - | SICA |
| Spending controls | - | - | - | - | 4 modes |
| Retries with smarter AI | - | - | - | - | 3-attempt |
| Runs 100% free locally | - | - | manual | - | native |
| Stress tested + benchmarked | - | - | - | - | 34 tests |
| Open source | partial | Apache 2.0 | Apache 2.0 | - | Apache 2.0 |
pip install forgegod
Then: forgegod →
describe your task
Python 3.11+ • Auto-bootstraps local config on first chat • Add forgegod --terse when you want lower token overhead • Run forgegod evals, forgegod evals --matrix openai-surfaces, forgegod evals --matrix openai-live, or forgegod evals --matrix openai-live-compare or forgegod evals --matrix minimax-live-compare to verify the harness itself • Inspect OpenAI surfaces with forgegod auth explain before syncing defaults • Works on macOS, Linux, Windows •
Full docs on GitHub