TAO Principle [1.0]: "Foundation architecture precedes surface features. Map backend layers, data relationships, stack choices, and hardening before building UI." — This document IS the backend map. Every spec here must reach SOLVED or PARTIAL (with documented workaround) before the first commission. No exceptions.
What "sellable" means: A client can install the base template on their hardware, ingest their first documents, query their knowledge bank, receive useful responses, and trust that their data is sovereign — all without any of the three founders present. The system works alone. That is the bar.
Layer 1
Schema & Data Architecture
The foundation of everything. If the schema is wrong, every codex, every collective emission, every synthesis is broken. These specs must be locked before any other layer ships.
SPEC-001
Universal Node Schema
SOLVED
Problem
All nodes across all builds must use identical field definitions or codex interoperability breaks.
Solution
NODE_SCHEMA v3: id (UUID), vector (1024-dim float32), text, title, node_type, source_id, framework_id, confidence_score, gravity_score, tags (csv), mechanism, situation, when_not, collection, date_added. Validated in production with 6,797 nodes.
SPEC-002
Embedding Model Standardization
PARTIAL
Problem
Q uses BGE-M3 (1024-dim). Rob uses nomic-embed-text (768-dim). Different dimensions cannot be merged. Codex packs must work across all builds.
Solution
Standardize on BGE-M3 (1024-dim) as specified in whitepaper. Open: BAAI/bge-m3 (1.3GB) or bge-m3-v2 if released. Must run on CPU (no GPU dependency for client builds).
Reconcile
Rob must migrate 16,717 holons from 768 → 1024-dim. Options: (a) re-embed all holons with BGE-M3 (batch job, ~4h on CPU), (b) maintain dual-embed with translation layer (complex, fragile — not recommended), (c) Rob's system stays independent until codex exchange begins. Recommend (a).
Research
Benchmark BGE-M3 vs nomic-embed-text on our actual data. Is retrieval quality materially different? If nomic is significantly better for Rob's use case, we need to justify the switch with data, not preference.
Owner
ROB (migration) Q (benchmark)
SPEC-003
Gravity Score Formula
OPEN
Problem
Will identified that neither confidence alone (funnel-scoped) nor error count alone (failure-scoped) is sufficient. Need a unified "weight of evidence" that integrates multiple signals.
Proposed
gravity = w1*confidence + w2*norm(validation_count) + w3*consistency + w4*(1 - norm(error_count)) + w5*recency
Where consistency = average confidence of semantically adjacent nodes (cosine ≥ 0.85). Recency = decay function on time since last validation.
Reconcile
Weights (w1–w5) must be calibrated empirically. Start with equal weights (0.2 each), then tune based on which ranking produces better retrieval quality. Need a ground-truth eval set — 50 queries where we know the "right" top-5 results. Q builds eval set from existing TAO, all three score it.
Research
How does gravity interact with the synthesis pipeline? Should the brief assembler cluster by gravity instead of confidence? Does gravity replace confidence in codex metadata or supplement it?
Owner
WILL (spec) Q (implementation)
SPEC-004
Confidence History / Belief Versioning
OPEN
Problem
No audit trail for how beliefs evolved. A principle at 0.4 that was once 0.9 is fundamentally different from one that was always 0.4 — but the system can't distinguish them.
Proposed
SQLite table: confidence_history (principle_id, old_score, new_score, timestamp, trigger [ingestion|performance|manual|hardening|codex], context TEXT). Principles dropping >0.3 from peak flagged for review.
Reconcile
Storage cost: at 6,797 nodes with avg 5 changes/year = 33,985 rows/year. Negligible. But should this be in LanceDB (vector-searchable) or SQLite (relational query)? Recommend SQLite — confidence history is queried by principle_id, not by similarity.
Owner
WILL (spec) Q (implementation)
SPEC-005
Error Bank Schema
PARTIAL
Problem
Failures are not first-class knowledge. When a principle informs an action that fails, the failure context is lost.
Solution
Rob's errors.lance collection exists in GHOSTNET. Adapt to NODE_SCHEMA: error_type (prediction_failure | action_failure | ingestion_error | synthesis_error), related_principle_id, context, outcome, severity. Embeds on the error description for similarity search ("have we seen this kind of failure before?").
Reconcile
Does the error bank use NODE_SCHEMA (same as principles) or a separate ERROR_SCHEMA? Separate is cleaner but breaks codex portability. Recommend NODE_SCHEMA with node_type='error' and extra fields in tags/mechanism.
SPEC-006
Edge Threshold Calibration
PARTIAL
Problem
0.85 cosine threshold validated on Q's TAO (marketing/psychology domain). May not generalize to health, finance, or technical domains where semantic similarity patterns differ.
Solution
Current: fixed 0.85 across all collections. Proposed: per-domain threshold calibrated during hardening. Health principles may need 0.82 (more varied language). Finance may need 0.88 (more precise language).
Research
Test edge quality at 0.80, 0.85, 0.90 on Rob's security/trauma domain data and Will's political/financial domain data. Compare precision (are the edges real?) and recall (are we missing valid edges?).
Layer 2
Agent Core
The reasoning engine that sits on top of the knowledge bank. Must work with any local model, maintain identity across sessions, and be configurable per client.
SPEC-007
Agent Loop Architecture
PARTIAL
Problem
Q's agent_loop is tightly coupled to VOHU MANAH's specific tools and context assembly. Rob's daemon is 25K lines and tightly coupled to GHOSTNET's holonic structure. Neither is portable as-is.
Solution
Extract a clean agent loop: message → context assembly (SPINE + RAM + retrieved knowledge) → LLM call → tool execution → response → state update. Tools registered via config, not hardcoded. Context assembly is pluggable per domain module.
Reconcile
Q's tool-use loop (core/tools.py, 1,150 lines) works with Anthropic's tool-use API. Rob's works with Ollama function calling. The base template must support both — abstract the tool-call interface so the agent loop doesn't care which LLM provider is behind it.
Owner
Q (architecture) ROB (Ollama compatibility)
SPEC-008
SPINE / RAM / Beliefs Hierarchy
PARTIAL
Problem
Three identity layers exist (Q's SPINE+RAM, Rob's beliefs.json). How do they interact? What's the priority when they conflict?
Solution
SPINE (immutable identity, manual edit only) > Beliefs (core axioms, deliberate review to change) > RAM (rolling state, auto-updated). SPINE overrides beliefs. Beliefs override RAM. Conflicts resolved by hierarchy.
Reconcile
Format: SPINE stays as markdown (human-readable, easy to edit). Beliefs as JSON (machine-readable, schema-enforced). RAM as markdown (free-form, auto-truncated at ~80 lines). The agent context assembler loads all three in priority order.
SPEC-009
Model Abstraction Layer
OPEN
Problem
The base template must work with any local model (Ollama, llama.cpp, vLLM) and optionally cloud models (Anthropic, OpenAI). Q's llm.py is Anthropic + Ollama specific. Clients may use different providers.
Proposed
Unified interface: call(messages, model_key, tools=None) → response. Provider selected by config.yaml. Model registry maps model_key → provider + model_id + context_window + cost. Cascade logic optional per deployment.
Research
OpenAI-compatible API (which Ollama supports) as the universal interface? Or custom abstraction? LiteLLM as a dependency vs. rolling our own? LiteLLM adds a dependency but supports 100+ providers out of the box.
Owner
Q (architecture) ROB (local model testing)
SPEC-010
Inter-Agent Message Bus
OPEN
Problem
Will identified: neither system has agent-to-agent communication. VOHU agents talk via shared DB. GHOSTNET swarm talks via supervisor. No direct bus.
Proposed
Lightweight message bus: named channels per domain, priority levels, structured messages {from, to, channel, priority, payload, timestamp, requires_response}. Implementation: SQLite table (simple, no external deps) or Redis (fast, adds dependency).
Reconcile
Recommend SQLite for MVP. No external dependency. Polling-based (agent checks its inbox on each loop iteration). Upgrade to Redis/NATS later if latency matters. For the base model sold to clients, simplicity > performance.
Owner
WILL (spec) Q (implementation)
SPEC-011
Foundational Pact (Machine-Readable Laws)
PARTIAL
Problem
The immutable laws (from founding exercises) must be loaded into every agent's context. Not as a suggestion — as a hard constraint that the system cannot violate.
Solution
Rob's foundational_pact.txt loaded as system-level context before SPINE. Format: structured YAML with law_id, text, enforcement_type (hard_block | soft_warning | audit_log). Hard blocks prevent the action. Soft warnings flag but allow. Audit logs record for review.
Reconcile
The three founding laws are clear (for mankind, sovereign data, no weaponization). But how do you enforce them in an LLM-based system? LLMs don't have hard constraints — they have probabilistic compliance. The pact must be: (1) in system prompt (probabilistic), (2) in output validator (deterministic — scan responses for violations), (3) in tool permissions (structural — certain tools simply don't exist).
Owner
ROB (format) ALL (content)
Layer 3
Ingestion & Knowledge Processing
How raw input becomes structured knowledge. The pipeline must handle text, audio, and (v3) images — all producing NODE_SCHEMA output.
SPEC-012
Ingestion Pipeline Portability
PARTIAL
Problem
Q's pipeline (knowledge/ingest/) is 12 files tightly integrated with VOHU MANAH's specific collections and model assignments. Not portable as-is.
Solution
Extract core pipeline: source → chunk → extract → embed → store. Config-driven: extraction model, chunk size, target collection all in config.yaml. No hardcoded references to specific collections or model keys.
SPEC-013
Multi-Modal Input Pipeline
RESEARCH
Problem
Will identified: neither system treats images, screenshots, or UI data as first-class inputs. Text and audio only.
Proposed
Image → vision model (local: LLaVA, MiniCPM-V; cloud: Claude vision) → text description → standard text pipeline. Screenshot → OCR (Tesseract/PaddleOCR) + layout analysis → structured text. All modalities produce NODE_SCHEMA output. Embedding is always text-based (BGE-M3).
Research
Which local vision model fits in 8GB VRAM alongside the primary inference model? LLaVA-1.6 7B (Q4 ~4GB) is promising but untested on our hardware. Can we run vision model on CPU while inference uses GPU? Benchmark needed.
Owner
ROB (local model research) WILL (spec)
SPEC-014
Automated Ingestion Triggers
OPEN
Problem
Ingestion is currently manual (Q triggers) or scheduled (6am daily collectors). No event-driven processing.
Proposed
Filesystem watcher (watchdog library) on /inbox/ directory. New file → detect type → route to appropriate pipeline → ingest → harden → notify. Webhook endpoints for API-driven triggers (voice note received, analytics collected, codex installed).
Reconcile
Platform compatibility: watchdog works on Windows (Q), macOS (Rob), Linux (client servers). Webhook server adds a running process — is this acceptable for the base template? Recommend: filesystem watcher as default, webhook server as optional domain module.
SPEC-015
Orchestration with Rollback (Knowledge CI/CD)
OPEN
Problem
No safety net for knowledge operations. A bad ingestion batch or codex install can corrupt the knowledge bank with no way to revert.
Proposed
Before every batch operation: (1) snapshot current state (snapshot.py already exists), (2) run operation against staging copy, (3) compare: connectivity %, high-gravity node integrity, duplicate count, (4) if regression detected → rollback + log to error bank + alert. If clean → commit to main.
Reconcile
LanceDB doesn't have native transactions. Rollback means: restore from snapshot. Current snapshot captures counts but not full data. Need: full LanceDB backup before each batch operation. Storage cost: ~200MB per snapshot at 6,797 nodes. Acceptable for daily, expensive for per-operation. Compress with zstd?
Research
LanceDB versioning (lance format supports time-travel queries). Can we use native versioning instead of full backup? Would eliminate the storage cost entirely.
Owner
WILL (spec) Q (LanceDB versioning research)
SPEC-016
Codex Import Validation
PARTIAL
Problem
When a client installs a codex, how do we validate it's not corrupted, poisoned, or schema-incompatible?
Solution
Codex validation: (1) schema check — all required NODE_SCHEMA fields present, (2) embedding dimension check — all vectors are 1024-dim, (3) signature verification — codex signed by issuer (Meridian or authorized expert), (4) anomaly scan — statistical check that embeddings cluster normally (poisoned data shows distributional anomalies), (5) rollback guard wraps the full install.
Owner
Q (schema) ROB (signature + anomaly)
Layer 4
Security & Sovereignty
The non-negotiable layer. If a client's data leaks, the business is over. These specs are Rob's domain.
SPEC-017
At-Rest Encryption
PARTIAL
Problem
Neither Q's nor Rob's current system encrypts the LanceDB files at rest. If hardware is stolen, data is exposed.
Solution
AES-256 encryption on the knowledge.db directory. Decrypted on mount (using FUSE/VeraCrypt container or OS-level encryption). Hardware key (YubiKey) required to unlock. Data exists unencrypted only in RAM during operation.
Reconcile
VeraCrypt container vs. OS-level (BitLocker/FileVault/LUKS). VeraCrypt is cross-platform but adds friction. OS-level is seamless but platform-dependent. Recommend: OS-level as default + VeraCrypt guide for paranoid clients.
SPEC-018
Network Isolation
SOLVED
Problem
The system must operate fully offline. No phone-home, no telemetry, no cloud dependency for core function.
Solution
All inference local (Ollama). All embeddings local (BGE-M3). All storage local (LanceDB + SQLite). Internet required only for: (a) optional cloud model access, (b) collective synthesis layer connection, (c) codex downloads. All three are opt-in. Core function is 100% offline. Validated: Rob's GHOSTNET runs fully air-gapped on Raspberry Pi.
SPEC-019
Sanitization Pipeline
PARTIAL
Problem
When the system ingests external data (web search, API responses, downloaded documents), how do we prevent data poisoning or personal data leakage?
Solution
Rob's sanitization pipeline from GHOSTNET: all external inputs pass through (1) PII detection (regex + NER model), (2) content classification (technical/personal/commercial), (3) domain relevance check (is this related to the active query?), (4) output redaction (strip detected PII before storage). Only sanitized content touches the knowledge bank.
Research
Which NER model for PII detection runs locally on CPU? spaCy (fast, good English) vs. GLiNER (multilingual) vs. Presidio (Microsoft, comprehensive but heavier). Benchmark on speed and accuracy for our use case.
SPEC-020
Heartbeat & Health Monitoring
PARTIAL
Problem
If a daemon crashes, a model goes offline, or the knowledge bank corrupts — the system should detect and alert, not fail silently.
Solution
Rob's heartbeat protocol: heartbeat.json updated every 60s with: timestamp, active_models, knowledge_bank_size, last_backup, daemon_status, disk_space. If heartbeat stops for >5min, recovery daemon triggers restart. Dashboard shows system health at a glance.
SPEC-021
Kill Switch Protocol
SOLVED
Solution
Physical. Pull ethernet. Power down. No remote override possible. The hardware is the user's. Period. This is a design principle, not a feature.
Layer 5
Interface & Experience
How the client actually interacts with their sovereign AI. Must be simple enough for non-technical users, powerful enough for operators.
SPEC-022
Primary Chat Interface
OPEN
Problem
Q uses Telegram + Dash + Copilot (4 interfaces). Rob uses AnythingLLM. Clients need ONE primary interface that works out of the box.
Options
(a) AnythingLLM — already supports LanceDB, local models, multi-workspace. Rob has experience. But: closed-source core, limited customization, dependency risk.
(b) Open WebUI — open-source ChatGPT-like interface. Supports Ollama natively. Extensible. Active development. But: no native LanceDB integration — would need custom RAG plugin.
(c) Custom PWA — full control, matches Meridian brand. But: significant build effort. Not MVP-viable.
(d) Telegram/Signal bot — zero install friction, works on phone. But: limited UI, no visual dashboard.
Reconcile
Recommend: Open WebUI + custom RAG plugin for MVP. It's open-source (no dependency risk), works with Ollama, has conversation history, supports multiple models. We build a LanceDB retrieval plugin that injects context from the knowledge bank. Dashboard comes later as a separate service.
Owner
ROB (evaluation) Q (RAG plugin)
Solution
WhisperX (local, open-source) for transcription. Both Q and Rob already use this. Runs on CPU. Input: voice note (any format) → WhisperX → text → standard agent pipeline. Validated in production (35+ voice logs in GHOSTNET, Q's Telegram voice pipeline).
SPEC-024
Dashboard / State Viewer
RESEARCH
Problem
Clients need to SEE their knowledge bank growing — node counts, connectivity, domains active, recent ingestions, system health. The chat interface alone doesn't provide this.
Options
(a) Dash/Plotly (Q's stack) — powerful but Python-heavy. (b) Static HTML generated on each snapshot — lightweight, no server. (c) Obsidian plugin (Rob's workflow) — familiar to some clients. (d) Defer to post-MVP — chat + CLI is enough for founding clients.
Reconcile
Recommend: defer to post-MVP. The founding 3–5 clients are operators who can use CLI + chat. The dashboard is a retention feature, not an acquisition feature. Build it after first revenue.
Layer 6
Resilience & Autonomy
Problem
Active synthesis is deliberate — you tell it what to synthesize. Background synthesis (dream cycles) finds connections the active pipeline doesn't look for.
Solution
Rob's holonic dream phase: idle-period processing where the system randomly samples N principles, attempts cross-domain connections, and stores discoveries in dreams.lance. Surfaced to main KB when confidence crosses threshold.
Research
What triggers "idle"? Cron (every 4h)? Low-activity detection? How do you prevent dream cycles from consuming compute needed for interactive queries? Priority: nice-to-have for MVP, critical for v2.
SPEC-026
Ghost Swarm (Autonomous Workers)
RESEARCH
Problem
Some tasks (deep research, batch processing, web search) block the main agent. Autonomous workers handle these in parallel.
Solution
Supervisor dispatches tasks to specialized workers. Workers operate independently, report results back. Sanitization pipeline ensures all external data is clean before it touches the knowledge bank.
Research
Resource constraints: running supervisor + 2–3 workers + primary agent on 64GB RAM / 8GB VRAM. Can we run workers on smaller models (Qwen 2.5 7B) while the primary uses 30B+? Worker model selection by task type?
Reconcile
Priority: post-MVP. The base model ships with a single agent. Swarm is an upgrade module. Founding clients get it in their first quarterly update.
SPEC-027
Approval Queue (Human-in-the-Loop)
PARTIAL
Problem
Some operations are too consequential for automatic execution: bulk ingestion, codex install, collective emission, belief modification.
Solution
Approval queue: system proposes, user confirms. Proposals stored in SQLite with: action_type, payload_summary, risk_level, proposed_at, status (pending/approved/rejected). Surfaced in chat: "I'd like to ingest 47 documents. This will add ~200 nodes. Approve?"
Layer 7
Collective Layer (Post-MVP)
Not in the base model sold to clients. Built internally for the founding three first. Opened to the founding 33 after validation.
SPEC-028
Synthesis Emission Protocol
OPEN
Problem
How does a sovereign node package and emit a principle to the collective? What's the format, the transport, the validation?
Proposed
Emission packet: {principle text, confidence, gravity, domain, validation_count, confidence_history, error_count, signature, timestamp}. Transport: signed JSON over HTTPS to Mother TAO endpoint (early), or peer-to-peer gossip protocol (mature). Validation: schema check + signature verify + anomaly detection.
Research
Zero-knowledge proofs for unattributable emission. ZKP is computationally expensive. Is it necessary for 3 founders? Probably not. At 33 founders? Maybe. At 100+? Definitely. Phased approach: simple signature for founders, ZKP when circle expands.
Owner
Q (protocol) ROB (crypto)
SPEC-029
Mother TAO Architecture
OPEN
Problem
Where does the Mother TAO run? Who hosts it? How is it secured?
Options
(a) Hosted by Q (centralized, simple, trust-dependent). (b) Hosted on shared VPS with multi-sig access (semi-distributed). (c) Replicated across all three founders (fully distributed, complex). (d) IPFS/Arweave for the knowledge bank + coordination server for synthesis (hybrid).
Reconcile
Recommend: (b) for MVP. Shared VPS (Hetzner, no-logs jurisdiction). All three founders hold SSH keys. LanceDB replicated nightly to all three founders' machines as backup. Migrate to (c) or (d) when scale demands it. Don't over-engineer the collective before it has 3 users.
Owner
ROB (infrastructure) Q (schema)
SPEC-030
Codex Poisoning Defence
OPEN
Problem
Rob's attack vector: training data poisoning via controlled assets, untraceable manipulation. How do you detect and prevent poisoned codexes or synthesis emissions?
Proposed
Multi-layer defence: (1) Statistical anomaly detection on incoming principles (distributional shift from existing knowledge), (2) Minimum validation_count threshold for collective emission (can't emit untested principles), (3) Cross-validation — a principle must be independently validated by ≥2 nodes to enter Mother TAO at high gravity, (4) Audit trail — all emissions logged, reviewable by any founder.
Research
How do you detect "gradually shifted" poisoning where each individual emission is within normal distribution but the aggregate shifts the knowledge bank? This is the hardest attack vector. Statistical drift detection over time windows?
Owner
ROB (security) WILL (detection logic)
Layer 8
Self-Evolution Infrastructure
The defining architecture. Not a feature — the foundational capability that makes the system compound. The infrastructure provides three temporal layers that any agent can inherit. The agents themselves, their names, their domains — those emerge from the client's use via Seed Codexes. The infrastructure just needs to support: a past, a future, a dreaming bridge, and the mutation flow between user and agents.
Critical framing: The base model does NOT ship with predefined agents (no "Builder", "Oracle", "Writer"). It ships with the infrastructure for temporal agents + a Seed Codex that guides the client through creating their first agents during onboarding. The agents emerge from the client's needs and evolve with their mutations. This is what makes every build unique.
SPEC-031
Agent Activity Log (Past Layer)
PARTIAL
Problem
Agents need a filtered personal record of what they did, what worked, and what failed. Not the system-wide log — the agent's own perspective on its own performance. Must be queryable for dream cycles.
Proposed
Per-agent SQLite table: agent_activity (agent_id, action, outcome, success bool, error text, lesson text, manifesto_alignment float, timestamp). Filtered: each agent sees only rows matching its agent_id. Queryable by time range, success/failure, manifesto alignment score.
Reconcile
Q's current system logs activity to a shared activity_log table with type/domain filters. Rob's GHOSTNET logs to action_log.json. Both need to be adapted into per-agent filtered views. Recommend: shared table with agent_id column + filtered views, not separate tables per agent. Simpler, and the dream cycle just queries WHERE agent_id = self.
SPEC-032
Agent Manifesto (Future Layer)
PARTIAL
Problem
Each agent needs a living document describing what it aspires to become — mission, capabilities, aspirations, growth metrics, acknowledged gaps. This document must evolve with user mutations and agent dream outputs.
Proposed
MANIFESTO.md per agent (same memory/ directory as SPINE.md and RAM.md). Structured sections: mission, current_capabilities, aspirations, growth_metrics, acknowledged_gaps. The manifesto is the north star for the dream cycle — "am I getting closer to this?"
Reconcile
Q's agents already have MANIFESTO.md files (SELENE, PYTHIA, LUMENA). Rob's GHOSTNET doesn't have a manifesto equivalent — the Queen has beliefs.json but no forward-looking aspiration document. The manifesto is a new concept for the base model — validated in Q's system, needs to be formalized as infrastructure.
Research
How does the manifesto update? Manual edit (like SPINE)? Agent-proposed + user-approved? Automatic from user mutations? Recommend: hybrid. User mutations auto-propagate (the infrastructure handles this). Agent dream proposals require user approval. Manual edit always available.
SPEC-033
Dream Cycle Engine
OPEN
Problem
The bridge between past and future. Must: (1) reflect on activity log for patterns, (2) compare current trajectory to manifesto aspirations, (3) retrieve relevant principles from Knowledge Bank, (4) synthesize course-correction mutations. Must run during idle periods without blocking interactive queries.
Proposed
dream_cycle(agent_id) → reads last N activity entries + current manifesto + KB query → LLM generates: mutations[] (proposed behavioral changes with rationale + risk_level + manifesto_alignment), dream_log (audit record), manifesto_update (if the manifesto itself needs evolving). Trigger: cron (every 4h idle), or on-demand, or after significant events (e.g. 10+ new activity entries).
Reconcile
Rob's GHOSTNET has holonic_dream_phase.py and shadow_subconscious.py — these are dream engines but not connected to a manifesto concept. Q's system has no dreaming at all. The Meridian dream cycle is a new synthesis: Rob's dream mechanism + Q's manifesto concept = dreaming that learns from the past to reach the future.
Research
Resource management: dreaming uses LLM inference. How to prevent dream cycles from consuming compute needed for interactive queries? Options: (a) run only when system is idle >30min, (b) use a smaller model for dreaming (e.g. 7B) while primary uses 30B+, (c) queue dreams and execute during scheduled windows (e.g. 3am). What's the right default?
Owner
ROB (dream mechanism) Q (manifesto integration)
SPEC-034
Mutation Protocol
OPEN
Problem
How do mutations flow? User changes propagate to agents. Agent dreams propose changes back. These are bidirectional but asymmetric — user mutations are authoritative, agent mutations require approval.
Proposed
Mutation types: (1) user_mutation — user changes priorities/domains/knowledge → auto-propagates to relevant agent manifestos as aspiration updates. (2) dream_mutation — agent dream cycle proposes a change → enters approval queue → user accepts or rejects. (3) collective_mutation — Mother AI innovation feeds back → enters agent dream cycle as context for next reflection. Mutation format: {type, source, target_agent, change_description, rationale, risk_level, requires_approval bool}.
Reconcile
The approval queue (SPEC-027) handles dream_mutations. But user_mutations need a detection mechanism — how does the infrastructure KNOW the user's priorities changed? Options: (a) explicit user command ("I'm shifting to health"), (b) inferred from ingestion patterns (ingesting health docs → health mutation), (c) inferred from query patterns (asking health questions → health mutation). Recommend: (a) for MVP + (b) as enhancement.
SPEC-035
Seed Codex (Agent Bootstrap)
PARTIAL
Problem
The base model ships with no agents. The client needs a guided way to create their first agents — defining identities, domains, initial manifestos. This is the Seed Codex.
Proposed
The Seed Codex is a special codex that: (1) interviews the client about their life domains, priorities, and working style, (2) proposes an initial agent configuration (e.g. "based on your needs, I recommend 3 agents: one for your business operations, one for your health tracking, one for your creative work"), (3) creates SPINE.md + MANIFESTO.md + beliefs.json for each proposed agent, (4) initializes empty activity logs, (5) self-destructs once agents are configured — the Seed UI dissolves.
Reconcile
The Seed UI concept already exists in the whitepaper (for codex onboarding). The Seed Codex extends this: it's not just ingesting knowledge — it's creating the agents themselves from the client's responses. This is the most important onboarding experience in the entire system. If it's bad, the client starts with misaligned agents. If it's great, the system immediately feels personal.
Research
What questions does the Seed Codex ask? How many agents does it propose by default? Should there be a minimum (1) or recommended (3-5)? Should the Seed Codex come with pre-built agent archetypes (Operator, Builder, Oracle, Writer) as starting templates that mutate, or should every agent be built from scratch?
Owner
Q (design) WILL (client experience)
SPEC-036
Dream Output → Knowledge Bank Pipeline
OPEN
Problem
Agent dream outputs contain lessons and self-corrections. These need to flow into the Knowledge Bank as principles and errors — enriching the shared substrate for all agents.
Proposed
After each dream cycle: (1) extract any new principle from the dream output (e.g. "I learned that X approach fails when Y condition exists"), (2) store in Knowledge Bank with node_type='dream_insight' and source_id=agent_id, (3) store any error identified in errors collection, (4) run mini-hardening (edge building only) to connect dream insights to existing principles. Dream insights start at confidence 0.5 — they need real-world validation to rise.
SPEC-037
Collective Dream Protocol
RESEARCH
Problem
The Mother AI needs to dream across the collective — receiving anonymized dream outputs from all nodes, finding cross-node patterns, and producing innovations that feed back as mutations to individual agents.
Proposed
Collective dream cycle: (1) receive anonymized dream_insight principles from all connected nodes (same emission protocol as SPEC-028), (2) cluster by similarity — "3 nodes independently discovered the same pattern", (3) synthesize collective-level innovations from multi-node convergences, (4) broadcast innovations back to all connected nodes as collective_mutations, (5) individual agents receive these in their next dream cycle as additional context.
Research
How many nodes are needed before collective dreaming produces meaningful innovations? With 3 founders, the sample size is tiny. At 33, it's useful. At 100, it's transformative. For the 3-founder stage: collective dreaming is manual — the three founders share dream insights in their weekly sync and Q feeds the convergences to Mother AI. Automated at 33+.
Owner
Q (protocol) ROB (infrastructure)
MVP Gate
What Must Ship vs. What Can Wait
TAO Principle [0.85]: "Build exceptional product first, then market becomes easier through organic social proof." The base model doesn't need every feature. It needs the features it ships with to work flawlessly.
| Must Ship (MVP) | Status | Owner |
| Universal Node Schema (SPEC-001) | SOLVED | Q |
| Embedding standardization (SPEC-002) | PARTIAL — needs Rob migration | ROB |
| Agent loop + SPINE/RAM/Beliefs (SPEC-007, 008) | PARTIAL — needs extraction from VOHU | Q |
| Ingestion pipeline (text + audio) (SPEC-012) | PARTIAL — needs portability refactor | Q |
| Hardening pipeline (dedup, edges, frameworks) | SOLVED | Q |
| Synthesis pipeline (brief → seed → trunk → leaf) | SOLVED | Q |
| Chat interface (SPEC-022) | OPEN — needs evaluation | ROB |
| Voice input (SPEC-023) | SOLVED | ROB |
| Network isolation (SPEC-018) | SOLVED | ROB |
| At-rest encryption (SPEC-017) | PARTIAL — needs implementation | ROB |
| Kill switch (SPEC-021) | SOLVED | ALL |
| Codex import + validation (SPEC-016) | PARTIAL | Q ROB |
| Foundational pact (SPEC-011) | PARTIAL — needs enforcement layer | ROB |
| Model abstraction (SPEC-009) | OPEN | Q |
| Approval queue (SPEC-027) | PARTIAL | ROB |
| Agent activity log (SPEC-031) | PARTIAL — needs per-agent filtering | Q |
| Agent manifesto (SPEC-032) | PARTIAL — exists in VOHU, needs formalization | Q |
| Seed Codex (SPEC-035) | PARTIAL — Seed UI concept exists, needs agent bootstrap logic | Q WILL |
Can Wait (Post-MVP / v2)
| Feature | Why It Can Wait |
| Gravity score (SPEC-003) | Confidence alone works for MVP. Gravity is an optimization. |
| Confidence history (SPEC-004) | Logging can start on day 1 with a simple table. Full UI later. |
| Multi-modal input (SPEC-013) | Text + audio covers 95% of use cases. Images are an expansion. |
| Agent message bus (SPEC-010) | Single agent is sufficient for MVP. Bus needed when swarm ships. |
| Automated triggers (SPEC-014) | Manual ingestion is fine for first clients. Automate later. |
| Rollback (SPEC-015) | Snapshots provide manual rollback. Automated CI/CD is v2. |
| Dream engine (SPEC-025) | Active synthesis is sufficient. Background synthesis is enhancement. |
| Ghost swarm (SPEC-026) | Single agent handles founding client workload. |
| Dashboard (SPEC-024) | CLI + chat for operators. Dashboard is retention, not acquisition. |
| Dream cycle engine (SPEC-033) | Activity log + manifesto work without dreaming. Dreaming is the self-correction layer — powerful but not MVP-blocking. |
| Mutation protocol (SPEC-034) | Manual manifesto updates work for MVP. Automated mutation flow is v2. |
| Dream → KB pipeline (SPEC-036) | Requires dream engine. Deferred with it. |
| Collective dream protocol (SPEC-037) | Requires 33+ nodes. Manual at 3 founders. |
| Collective layer (SPEC-028–030) | Built for the 3 founders first, not sold to clients. |
MVP count: 18 specs must be solved. 6 already solved. 10 partial (need finishing). 2 open (need decisions). Estimated engineering effort: 4–5 weeks with all three founders contributing their domains in parallel. The self-evolution infrastructure (activity log + manifesto + seed codex) ships with MVP. Dreaming and mutation automation are v2.
Summary
Status at a Glance
| Status | Count | Meaning |
| SOLVED | 6 | Validated in production. No further work needed. |
| PARTIAL | 13 | Solution exists but needs porting, finishing, or reconciliation. |
| OPEN | 11 | Problem defined. Needs decision + implementation. |
| RESEARCH | 7 | Needs investigation before a solution can be proposed. |
Ownership Distribution
| Owner | Primary | Shared |
| Q | Schema, agent loop, ingestion, synthesis, model abstraction | Gravity formula, codex validation, Mother TAO |
| ROB | Encryption, network, sanitization, heartbeat, dream engine, swarm, interface eval | Embedding migration, foundational pact, codex validation |
| WILL | Gravity spec, temporal reasoning, message bus, triggers, rollback | Edge calibration, poisoning detection |