Living Document

Build Principles, Decisions, Security & Scalability

Meridian — v2.0

Version: v2.0

Date: 31 March 2026

Status: Living — amend, never delete

Section 03

Security Architecture

v1.0 31 March 2026 — Initial principles and decisions

v2.0 31 March 2026 — Scalability architecture, security layer, AI node analysis, step-by-step build guide added

Section 01

Foundation Principles

Non-negotiable architectural principles. Every proposed feature, shortcut, or integration is evaluated against these first.

Section 02

Scalability Architecture — Lessons From Frontier Labs

The frontier labs spent hundreds of millions discovering these architectural patterns. They are published research and observable system behaviours. The data scientist's job is to reverse engineer what they learned and apply it at the scale you are actually operating at.

Terminology Precision Note

The whitepaper uses node to mean three different things: (1) a knowledge unit in the knowledge bank, (2) a sovereign personal AI system in the collective layer, and (3) a neural network computational unit in AI architecture discussions. Technical readers will conflate these. Resolution: In collective layer descriptions, replace 'node' with 'sovereign instance' or 'member system'. Reserve 'node' exclusively for knowledge unit. This is a language precision fix required before the whitepaper is shown to technical builders.

Section 03

Security Architecture

Section 04

Architectural Decisions

Section 05

Step-By-Step Build Guide

Ordered by dependency. Each step produces something the next step requires.

Section 06

Open Questions & Amendment Log

Amendment Log

v1.0 31 March 2026 Initial document. 8 foundation principles, 5 architectural decisions, 4-phase build sequence, 5 open questions.

v2.0 31 March 2026 Added: Scalability architecture from frontier lab analysis (Mistral MoE, NeMo retrieval, Venice privacy, Anthropic constitutional AI, OpenAI trust hierarchy, ChatGPT memory anti-patterns, neural network principles). Full security architecture — four threat moments, three security components, query sanitisation layer, secure logging schema, emission queue with privacy controls. Principles 9 and 10 added. Step-by-step build guide Steps 1-13. Decision Records 6 and 7 added. Open questions OQ-05 and OQ-06 added.

Sovereignty Is Non-Negotiable — No Data Leaves Your Machine

Why

The manifesto's entire value proposition collapses if your data touches someone else's infrastructure. One API terms change, one acquisition, one breach — your sovereign intelligence is gone. You cannot be the antidote to institutional dependency while being dependent on institutions for your core brain functions.

What

All core functions — embedding, inference, storage, retrieval — run locally. External APIs are permitted only outside the brain loop: drafting non-sensitive external content, one-off research with the web-facing research agent. They are never called from ingestion, knowledge bank query, or any agent reasoning loop.

Schema First — The Node Schema Is Immutable

Why

Every principle extracted tonight will live in the knowledge bank for 30 years. A schema change at 50,000 nodes is not a weekend job — it is a full migration that risks data loss and breaks every codex exchange. The interoperability guarantee between sovereign systems depends on every node being in identical format.

What

The universal node schema is the first file created and the last file touched. Every system component writes to and reads from this schema. It never changes. New fields are added only as optional extensions, never as replacements. The schema IS the protocol.

Router Before Agents — Every LLM Call Goes Through One Layer

Why

Hardcoding model choices into individual functions means changing a model requires finding every place it is called. Without a router there is no evaluation data — you cannot see which model produced which output, which means you cannot improve model selection. The router is also where decomposition happens: complex requests broken into subtasks, each routed to the optimal model capability.

What

brain/router.py sits between all application code and all LLM providers. Every inference call passes through it. The router selects model based on task type, logs the call, and returns the response. It supports decompose() — breaking complex queries into subtasks routed independently and assembled. Swapping a model is a one-line config change.

Two-Stage Ingestion — Chunks Are Not Knowledge

Why

Storing document chunks gives you a search engine over your PDFs. That is not a brain. The difference is that a brain extracts what is true, validates it against what it already knows, and stores the insight — not the paragraph. NeMo research established that retrieval quality improvement matters more than model size improvement for knowledge-intensive tasks.

What

Stage one: chunk and embed raw document, store as node_type=chunk, create source record. Stage two: principle extraction pass — LLM reads chunk clusters and extracts atomic, irreducible insights with mechanism, situation, and when_not populated, stored as node_type=principle. Knowledge bank query layer prioritises principles over chunks in all retrieval.

Log Everything From Day One — Evaluation Is Not Optional

Why

Without logging the system cannot improve. You will not know which model made a mistake, which ingestion produced low-quality principles, or which queries returned irrelevant results. Six months of unlogged operation means six months of improvement signal is permanently gone. Critically: logs must never contain natural language queries or assembled context — only hashes and vectors.

What

Every LLM call logged: timestamp, task_type, model_used, tokens, latency, input_hash, output_hash. Every knowledge bank write logged: node_id, source_id, confidence, collection. Every query logged: salted_query_vector (never plaintext), retrieved_node_ids, context_hash (never context), response_hash. Logs live in logs/ as structured JSON. Local, sovereign.

SPINE Is Constitutional — Not Just Context

Why

Anthropic's Constitutional AI research established that the most reliable alignment mechanism is constitutional constraints evaluated at inference time — a model critiquing its own outputs against a set of principles before returning them. Your SPINE document is not just context that gets diluted as conversations grow. It is a personal constitution with evaluable clauses that every agent output is checked against before being returned to you.

What

SPINE.md contains evaluable clauses: OUTPUT MUST: lead with conclusion not reasoning. Flag uncertainty explicitly. Never soften a critical finding. Every agent runs a constitutional check on its own output before returning. Outputs that fail are revised automatically. SPINE is written by you, not generated, before any agent code is written. It is loaded at maximum trust level — above agent identity, above RAM.

Dashboard First, Chat Second — The Brain Is Not a Chatbot

Why

If the primary interface is a chat window, the mental model becomes a better ChatGPT. That framing shapes what you ask of it, how you evaluate it, and whether you notice it growing. A compounding intelligence system's primary interface should be the knowledge bank itself — what is in it, how it is growing, where density is developing.

What

Primary UI view: knowledge bank dashboard — node count by collection, confidence distribution, framework index growth, recent ingestions, system health. Chat is a panel that opens alongside the dashboard, not the main frame. This keeps the mental model correct: you are building and querying a brain, not chatting with an assistant.

Confidence Is Earned, Not Assigned

Why

A knowledge bank where every node has confidence 0.7 cannot distinguish between a battle-tested insight validated over 14 decisions and a claim read once in a dubious PDF. After 1,000 nodes the gravity score — your primary ranking signal — becomes meaningless if its confidence inputs are arbitrary.

What

Confidence calculated at ingestion from three signals: source type (your own transcript scores highest, third-party validated research scores mid, unverified web content scores lowest), claim strength (hypothesis vs observation vs validated finding, parsed from language), corroboration (does this principle already exist in the knowledge bank — if so, raise existing node confidence rather than creating a duplicate).

Queries Are Ephemeral — Natural Language Never Persists

Why

Even if your knowledge bank stores only vectors, natural language queries reveal what you are researching, what decisions you are evaluating, who appears in your research by frequency analysis, and what hypotheses you are testing. This is traffic analysis — you do not need to read the message to reconstruct the pattern. The founding security principle is that the system learns from what you know, not from how you asked about it.

What

Natural language queries are embedded immediately in RAM and the plaintext is discarded. The LLM never receives your query text — it receives only the top-K nodes retrieved by your query vector, plus a reconstructed context prompt. Query vectors are salted before any storage. Logs contain only: salted vector, retrieved node IDs, context hash, response hash. Nothing in any log reconstructs your query or reasoning.

Build for 0,000 Nodes From Day Zero

Why

Every structural decision — schema fields, index layers, edge weights, retrieval activation, pipeline parallelism — should be made as if the knowledge bank is already at full scale. The cost of building for scale from Day 1 is approximately zero. The cost of retrofitting a system that was not designed for it at 50,000 nodes is existential. This is the lesson from every frontier lab that had to rebuild their retrieval architecture at scale.

What

Two-level index instantiated from Day 1: frameworks/ layer (50-100 cluster summaries, maintained live) and nodes/ layer (atomic principles, growing unbounded). Async pipeline architecture from first ingestion. Non-linear gravity weighting in retrieval. Dynamic edge weights with co-activation history. All four are architectural decisions that cost nothing now and are catastrophically expensive to retrofit.

SOURCE / LAB	INSIGHT EXTRACTED	APPLIED TO MERIDIAN
Mistral — MoE Architecture	Sparse activation: only 2 of 8 expert networks fire per token. Same quality at a fraction of compute. The router is not choosing between models — it is choosing between specialised capabilities at the subtask level.	Router decomposes every complex request into subtasks. Each subtask routes to the optimal capability independently. A brief request = retrieval subtask (fast model) + relevance scoring (mid model) + synthesis (large model) + formatting (fast model). 40-60% more efficient than routing whole query to one model.
NVIDIA NeMo — RAG Research	Improving retrieval precision from 60% to 85% improved end-to-end answer quality more than doubling model size. A 7B model with excellent retrieval outperforms a 70B model with poor retrieval on knowledge-intensive tasks.	Two-level hierarchical index: framework summaries (50-100 clusters, stable) and atomic nodes (growing unbounded). Every query hits framework level first to identify relevant knowledge region, then retrieves atomic nodes only within matched domains. Context window fills with high-signal nodes, not random neighbours.
Venice — Inference Privacy	Even with local storage, query timing patterns reveal decision cycles, request frequency reveals activity patterns, and query shape reveals knowledge bank structure. The private context must be separated from public inference infrastructure by construction, not by policy.	Query sanitisation layer separates natural language (ephemeral, discarded) from vector (salted, stored). Assembled context is RAM-only, discarded after response, never persisted. Emission timing uses network-aggregate schedule not personal schedule. Volume smoothing prevents spike fingerprinting.
Anthropic — Constitutional AI	The most reliable alignment mechanism is constitutional constraints evaluated at inference time — model critiques its own output against a principle set before returning. Works at personal level with a structured SPINE document, not just at model training level.	SPINE restructured as constitutional document with evaluable clauses. Every agent runs constitutional check on output before returning. Violations trigger automatic revision. This is what prevents belief drift — the SPINE is not just context, it is the evaluation standard every output is checked against.
OpenAI — Trust Hierarchy	Three-tier hierarchy (system/user/assistant) establishes that different instruction sources have different trust levels and different override permissions. Collapsing these into one context window causes inconsistent agent behaviour because everything has equal weight.	Four-level trust hierarchy: SPINE (immutable, maximum trust), Manifesto (agent-level, high trust), RAM (session-level, medium trust), Query (request-level, low trust). Higher level always wins. An agent cannot be instructed in a query to violate SPINE. This is the architectural prevention of belief drift.
ChatGPT Memory — What To Avoid	Flat key-value memory with no weighting, no validation, no forgetting mechanism. Everything accumulates with equal weight. Old memories dropped arbitrarily when context fills. A preference mentioned once has same weight as a principle validated 100 times.	Knowledge bank with weighted, validated nodes — not a flat list of observations. Gravity score gives each principle a load-bearing weight based on validation count, confidence trajectory, and error history. A principle validated 14 times has 10x retrieval weight over something extracted from a single PDF.
Neural Network Architecture	Three applicable principles: (1) Weighted edges — connections between nodes are not binary, they have weights that evolve with use. (2) Layered processing — deep networks process at multiple abstraction levels simultaneously. (3) Non-linear activation — contribution scales non-linearly with strength, preventing weak signals from drowning strong ones.	Edge weight field in graph schema from Day 1, updated by co-activation history. Retrieval processes framework layer and node layer simultaneously, not sequentially. Non-linear gravity weighting: a principle at gravity 0.95 contributes dramatically more than one at 0.80, not just slightly more.

MOMENT	THREAT	RESOLUTION
MOMENT 1 — Query Input	Natural language stored in logs, visible to any process with log access. Query text reveals entities, decisions, and research topics by direct reading.	Embed query immediately in RAM. Discard plaintext. LLM receives only retrieved nodes — never the original query text. Pipeline inversion: query → embed → retrieve → reconstruct context → LLM(context only).
MOMENT 2 — Retrieval Vector	Even stored vectors are vulnerable. An adversary with the embedding model can reverse-engineer approximate query text by finding nearest neighbours in the model's embedding space.	Query vector salting: add controlled noise (magnitude 0.02) before any storage. Noise large enough to prevent exact reconstruction, small enough to not affect retrieval quality. Unsalted vector used for retrieval, never touches disk. Salted vector stored in log.
MOMENT 3 — Context Assembly	Assembled context (retrieved nodes + reconstructed prompt) reveals which principles you are reasoning against. In aggregate, this reveals your decision patterns even without the query text.	Context assembled in RAM only. Discarded immediately after LLM response generated. Log stores only context_hash (SHA-256 first 16 chars) — never the context itself. Sufficient for debugging and evaluation, insufficient for reconstruction.
MOMENT 4 — Collective Emission	Emission timing correlated with query timing reveals causation chain. Domain tags re-identify you if you are the only person in the collective researching a specific area. Volume spikes reveal major research events.	Emission queue with 24-48hr randomised jitter. Domain tags generalised one level up for collective (your venture/series-b/founder-led emits as investment/private-markets). Emission rate smoothing caps weekly volume. Re-embedding for emission: fresh vector, not stored vector.

01
Query Sanitisation Layer
brain/security/query_sanitiser.py
embed_and_discard(query_text) returns vector only. salt_vector(vector) adds controlled noise. reconstruct_prompt(nodes) builds LLM context from retrieved nodes without query text. Never writes query_text to any persistent storage. | 02
Secure Logging Schema
logs/evaluation.jsonl
Each entry: timestamp, salted_query_vector, retrieved_node_ids, context_hash (not context), response_hash (not response), model_used, latency_ms. Nothing in the log reconstructs the query, context, or response. Sufficient for evaluation and debugging. | 03
Emission Queue With Privacy Controls
brain/collective/emission_queue.py
gravity_threshold_check() queues passing principles. batch_with_jitter() fires on randomised 24-48hr schedule. generalise_tags() coarsens taxonomy for collective. re_embed_for_emission() generates fresh vector. emit() sends packet — no text, no identity, no trace.

DECISION 01 Embedding Model: BGE-M3 at 1024 Dimensions DECISION 01 Embedding Model: BGE-M3 at 1024 Dimensions

Rationale

BAAI/bge-m3 running locally via sentence-transformers. 1024-dimensional embeddings. CPU inference on M5 Pro. First download 1.3GB, then cached locally.

DECISION 02 Vector Database: LanceDB Local File-Based DECISION 02 Vector Database: LanceDB Local File-Based

Rationale

LanceDB running locally, data in brain/knowledge/db/. No server process required. Lance columnar format with native versioning for rollback-on-regression.

DECISION 03 Local LLM: Two-Tier Ollama Configuration DECISION 03 Local LLM: Two-Tier Ollama Configuration

Rationale

Ollama manages local model serving. llama3.1:8b for ingestion extraction and fast queries. 30B parameter model (benchmark required — see OQ-03) for synthesis and complex reasoning.

DECISION 04 No Cloud API Calls in the Brain Loop DECISION 04 No Cloud API Calls in the Brain Loop

Rationale

Anthropic, OpenAI, and any external API are prohibited from ingestion pipeline, knowledge bank query layer, agent reasoning loops, and chat interface when querying personal knowledge.

DECISION 05 First Agent: Intelligence Analyst — Three Modes DECISION 05 First Agent: Intelligence Analyst — Three Modes

Rationale

Intelligence Analyst agent operating in three modes: Ingest (documents in, principles out), Brief (decision in, relevant knowledge out), Monitor (domains defined, signal filtered daily).

DECISION 06 Two-Level Knowledge Index From Day One DECISION 06 Two-Level Knowledge Index From Day One

Rationale

Framework index (brain/knowledge/frameworks/) and atomic node index (brain/knowledge/nodes/) instantiated simultaneously. Every query hits framework level first, then atomic level within matched domains.

DECISION 07 Secure Logging Schema — Hashes Not Plaintext DECISION 07 Secure Logging Schema — Hashes Not Plaintext

Rationale

Evaluation log stores salted query vectors, node IDs, context hashes, and response hashes. Never stores query text, assembled context, or response text.

Step 01

Phase 1A Create Project Structure & Virtual Environment

RUN: mkdir meridian && cd meridian

RUN: mkdir -p brain/{knowledge/{db,frameworks,nodes},ingestion,agents,memory,security,router,collective} ui logs

RUN: python3 -m venv .venv && source .venv/bin/activate

RUN:  pip install lancedb sentence-transformers pypdf anthropic fastapi uvicorn python-multipart watchdog tiktoken pydantic rich python-dotenv numpy

The directory structure is the architecture made visible. Every folder corresponds to a system component. Creating it first forces you to think in the right abstractions before writing code.

Step 02

Phase 1A Install Ollama & Pull Local Models

INSTALL: Download Ollama from ollama.com — native M5 Pro support, no configuration needed

RUN: ollama pull llama3.1:8b

RUN: ollama pull llama3.1:70b (or chosen 30B+ model — see OQ-03)

VERIFY: ollama list — both models should appear

llama3.1:8b is your fast model for extraction and routine tasks. The 70B (Q4, ~40GB) uses most of your 48GB but leaves headroom. If memory pressure is a concern, try Qwen2.5:32b (~18GB) first. BGE-M3 downloads automatically on first use via sentence-transformers — 1.3GB, will appear to hang, let it run.

Step 03

Phase 1A Lock The Universal Node Schema

CREATE: brain/knowledge/schema.py

FIELDS: id (UUID), vector (float32[1024]), text, title, node_type, source_id, framework_id

FIELDS: confidence_score (float32), gravity_score (float32), edge_weight (float32)

FIELDS: tags, mechanism, situation, when_not, collection, date_added

VERIFY: Import schema in Python REPL — zero errors before proceeding

Note the addition of edge_weight (float32, default = cosine similarity at edge creation, updated by co-activation history). This field costs nothing now and is catastrophically expensive to add later. This file is now frozen — it is the last time you touch it.

Step 04

Phase 1A Build The LLM Router

CREATE: brain/router/router.py

IMPLEMENT: call(messages, task_type, tools=None) → response — routes by task_type

IMPLEMENT: decompose(request) → List[subtask] — breaks complex request into components

IMPLEMENT: log_call(task_type, model, input_hash, output_hash, latency) → None

CONFIG: brain/router/models.yaml — maps task_type → model_id + provider

VERIFY: router.call() returns response. router.log_call() writes to logs/evaluation.jsonl

Task types to define now: EXTRACTION (8B), REASONING (30B+), SYNTHESIS (30B+), FORMATTING (8B), RETRIEVAL_RERANK (8B), CONSTITUTIONAL_CHECK (8B). The router does not need to be smart on Day 1 — it needs to be the single entry point for all LLM calls. Intelligence comes from the evaluation log over time.

Step 05

Phase 1B Build The Query Sanitisation Layer

CREATE: brain/security/query_sanitiser.py

IMPLEMENT: embed_and_discard(text: str) → np.ndarray — embeds, returns vector, text never stored

IMPLEMENT: salt_vector(v: np.ndarray, magnitude=0.02) → np.ndarray — adds noise, renormalises

IMPLEMENT: reconstruct_prompt(nodes: List[dict]) → str — builds LLM context from nodes only

VERIFY: embed_and_discard() returns 1024-dim vector. No string stored anywhere in the call.

Build this BEFORE the knowledge bank. The sanitiser is called by every component that handles a query. If you build the knowledge bank first and wire queries directly, you will have plaintext queries in your first log entries. The secure schema must be established before any query is ever made.

Step 06

Phase 1B Build The Knowledge Bank

CREATE: brain/knowledge/store.py — KnowledgeBank class

IMPLEMENT: store(text, title, node_type, source_id, collection, confidence, ...) → node_id

IMPLEMENT: query(query_text, limit=8, collection=None) → List[node] via sanitiser

IMPLEMENT: query_frameworks(query_text) → List[framework] — hits framework index first

IMPLEMENT: _init_table() — creates LanceDB table with NODE_SCHEMA if not exists

VERIFY: store() a test node. query() returns it. Framework index exists in brain/knowledge/frameworks/

The query() method MUST call sanitiser.embed_and_discard() — never accept query_text directly for embedding. The two-level index: query_frameworks() first, then query() within matched collections. This is the retrieval architecture that scales to 100,000 nodes without degradation.

Step 07

Phase 1B Build The Confidence Scoring Engine

CREATE: brain/ingestion/confidence.py — ConfidenceScorer class

IMPLEMENT: score_source(source_type: str) → float — transcript=0.85, research=0.70, web=0.45

IMPLEMENT: score_claim_strength(text: str) → float — LLM via router classifies hypothesis/observation/validated

IMPLEMENT: score_corroboration(text: str, kb: KnowledgeBank) → float — queries existing bank

IMPLEMENT: final_score(source, claim, corroboration) → float — weighted combination

VERIFY:  Score three test inputs: your own voice note, a PDF excerpt, a web article snippet. Scores should differ meaningfully.

Corroboration scoring is the most important of the three. When a new principle matches an existing node (cosine similarity > 0.85), do not create a duplicate — raise the existing node's confidence score and add the new source to its source_ids. This is how the knowledge bank gets calibrated to reality over time.

Step 08

Phase 1B Build The Two-Stage Ingestion Pipeline

CREATE: brain/ingestion/pipeline.py — IngestionPipeline class

STAGE 1: ingest_document(path) → chunk → embed → store as node_type=chunk → create source record

STAGE 2:  extract_principles(source_id) → LLM reads chunks → extracts atomic insights → stores as node_type=principle with mechanism/situation/when_not populated

IMPLEMENT: Confidence scoring via ConfidenceScorer on every principle before storage

IMPLEMENT: Async: Stage 1 and Stage 2 run concurrently across document batches

VERIFY:  Ingest one PDF. Stage 1 produces chunks. Stage 2 produces principles. Both visible in knowledge bank. Confidence scores vary by source.

Stage 2 prompt engineering matters enormously. The extraction prompt should instruct the LLM to produce one principle per atomic idea, populate mechanism (HOW/WHY this is true), situation (WHEN this applies), and when_not (WHEN this does NOT apply). Poor prompt = chunks with labels. Good prompt = genuine intelligence extraction.

Step 09

Phase 1C Write Your SPINE.md — Do This Yourself

OPEN: brain/memory/SPINE.md — write this, do not generate it

SECTION: Who I Am & How I Reason — not your CV, your actual cognitive style

SECTION: Constitutional Clauses — evaluable OUTPUT MUST / OUTPUT NEVER rules

SECTION: Communication Preferences — directness level, format, what wastes your time

SECTION: Known Blindspots — where your thinking tends to go wrong, be honest

SECTION: What Good Output Looks Like — concrete examples if possible

Spend 30 minutes minimum on this. Everything your agents produce will be shaped by what you write here. The constitutional clauses are the most important part — write them as evaluable rules, not as preferences. 'I prefer directness' is not evaluable. 'OUTPUT MUST lead with the conclusion, never with the reasoning' is evaluable. The difference determines whether the constitutional check actually works.

Step 10

Phase 1C Write Your RAM.md — Current State

OPEN: brain/memory/RAM.md — keep this under 80 lines always

SECTION: Active Projects — Meridian Phase 1, Week 1

SECTION: Current Open Decisions — what you are deciding right now

SECTION: Recent Context — what happened in the last 7 days that matters

SECTION: This Week's Priority — complete foundation build

RAM.md is your working memory. SPINE.md is your identity. They serve different functions. RAM is high-churn and approximate — it gives agents current context without touching SPINE. The 80-line limit is not arbitrary: beyond 80 lines, RAM stops being working memory and starts being a second knowledge bank, which is the wrong architecture.

Step 11

Phase 1C Build The UI — Dashboard First

CREATE: ui/main.py — FastAPI application

PRIMARY VIEW:  Knowledge bank dashboard — node count by collection, confidence distribution, framework index, recent ingestions, system health

SECONDARY VIEW: Chat panel — opens alongside dashboard, not replacing it

IMPLEMENT: Document drop zone → triggers ingestion pipeline → progress shown live

IMPLEMENT:  Chat uses sanitiser → retrieves nodes → reconstructs prompt → Ollama → response with source nodes shown

VERIFY: Dashboard loads with stats. Drop zone accepts PDF. Chat responds from knowledge bank.

Dark theme. The UI should feel like a control room, not a chat app. Every response in the chat panel should show which knowledge bank nodes were used to generate it — this is what builds trust in the system and makes the compounding visible. Node count, confidence distribution, and framework growth should update in real time as ingestion runs.

Step 12

Phase 1D Proof of Life — The System Works

DROP: A PDF you know well into the document drop zone

WATCH: Stage 1 runs — chunk count appears in dashboard

WATCH: Stage 2 runs — principles appear with confidence scores, not 0.7 uniform

QUERY: Ask a question about the PDF content in the chat panel

VERIFY: Answer is drawn from knowledge bank nodes, not from LLM training data

VERIFY: Source nodes are shown. Evaluation log has entry with salted vector, no plaintext.

You are done when: the system answers questions from your knowledge bank, the answer traces back to specific nodes, confidence scores vary meaningfully across principles, and the evaluation log contains zero natural language. If any of these fail, diagnose before building Phase 2. A shaky foundation cannot support agents.

Step 13

Phase 1D Security Validation — No Plaintext Anywhere

CHECK: grep -r 'query_text' logs/ — should return zero results

CHECK:  cat logs/evaluation.jsonl | python3 -c 'import sys,json; [print(k) for l in sys.stdin for k in json.loads(l).keys()]'

VERIFY:  Log keys are only: timestamp, salted_query_vector, retrieved_node_ids, context_hash, response_hash, model_used, latency_ms

CHECK: brain/security/query_sanitiser.py — no function writes string to disk

VERIFY: Query the system 5 times with sensitive terms. Read the log. Nothing is reconstructable.

This validation is not optional. Run it before building any agent. If you find plaintext in logs at this stage, the fix is straightforward. If you find plaintext at Week 4 with agents running, the fix requires auditing every call path. Do it now.

OQ-01

What gravity threshold triggers emission to the collective layer? High threshold = proven principles only, slow collective growth, maximum sovereignty. Low threshold = faster growth, slightly higher re-identification risk. This is a governance decision requiring founding consensus.

Open — requires founder consensus before collective layer build

OQ-02

What are your three to five monitoring domains for the Intelligence Analyst monitor mode? The more specific the domains, the faster the knowledge bank becomes genuinely useful versus generically informed.

Open — to be defined before Week 3 build

OQ-03

Which 30B+ parameter model for local reasoning? Llama 3.1 70B (Q4 ~40GB, tight on 48GB), Mixtral 8x7B (~26GB, strong reasoning), Qwen 2.5 32B (~18GB, excellent benchmark performance). Benchmark required on your actual task types before committing.

Open — requires benchmark on M5 Pro before Step 2

OQ-04

What is the SPINE.md review cadence? Changes have downstream effects on every agent. Too frequent = instability. Too infrequent = drift from reality. Suggest: triggered by significant life change OR quarterly minimum, whichever comes first.

Open — to decide after first 30 days of use

OQ-05

Collective layer terminology: replace 'node' with 'sovereign instance' or 'member system' throughout the whitepaper collective layer descriptions. Reserve 'node' exclusively for knowledge unit. When does this edit happen and who owns it?

Open — whitepaper edit required before technical founder onboarding

OQ-06

What is the Seed Codex agent bootstrap design? The base model ships with no agents — the Seed Codex guides first-time setup with a behavioural reflection loop (simulate agent response, client corrects simulation not abstract config). Design unresolved.

Open — Phase 2 concern, not Week 1