Structured Project Memory — Technical Specification (v1.0.0-rc.1)

1. Purpose

Structured Project Memory (SPM) is Lexon’s native “second brain.” It ingests raw artifacts (code, guides, decisions, logs), normalizes them into canonical MemoryObjects, stores them in pluggable tree/index backends, and exposes retrieval primitives (recall_context, recall_kind, before_action use_context) so every Lexon program can access curated context before or alongside traditional RAG. The goal is to provide durable knowledge per project with deterministic behavior, governance, and zero extra frameworks.

2. System overview

Raw assets ──> Semantic layer ──> MemoryObject ──> Backend (basic/patricia/raptor/hybrid)
     ^             |                                 |               |
     |             |                                 |               └─> Policy evaluation, summaries
     |             └─> remember_raw / remember_structured           ↓
  Lexon runtime <────────────────────── recall_context / recall_kind / before_action use_context

2.1 Components

  1. Semantic Layer
  2. Storage Layer (Pluggable Backend)
  3. Runtime Layer

3. Memory object schema

{
  "id": "mem_* (optional override)",
  "path": "project/module/topic",
  "kind": "guide|config|decision|log|custom",
  "raw": "... original text ...",
  "summary_micro": "1-2 sentences",
  "summary_short": "short paragraph / bullets",
  "summary_long": "long-form abstract",
  "tags": ["lowercase", "slugs"],
  "metadata": {"project": "...", "space": "...", "importance": "high"},
  "relevance": "high|medium|low",
  "pinned": true|false,
  "created_at": "RFC3339",
  "updated_at": "RFC3339"
}
{
  "id": "mem_runtime_decision_001",
  "path": "lexon/runtime/release_checklist",
  "kind": "decision",
  "raw": "Ship RC once structured memory recalls <200ms and MCP supervisor is green.",
  "summary_micro": "RC go/no-go checklist",
  "summary_short": "Decisions and blockers for promoting Lexon v1.0.0-rc.1 to GA.",
  "summary_long": "Context, telemetry targets, and required samples-smoke pass/fail report for the release.",
  "tags": ["runtime", "release", "checklist"],
  "metadata": {"project": "lexon", "space": "runtime", "source": "docs/runtime_decision.md", "hash": "sha256:6be0...", "pii_flags": []},
  "relevance": "high",
  "pinned": true,
  "chunks": [
    {"chunk_id": "runtime_decision_001#0", "text": "Context..."},
    {"chunk_id": "runtime_decision_001#1", "text": "Checklist..."}
  ],
  "created_at": "2025-12-01T11:32:00Z",
  "updated_at": "2025-12-01T12:04:00Z"
}

3.1 Retention + PII

Retention enforcement flow: 1. remember_* stamps created_at/updated_at using deterministic clock when freeze_clock is set. 2. set_memory_policy with ttl_days + allow_delete=true enables memory_space.gc(space) to purge expired objects (and write audit logs). 3. Redaction occurs before serialization: detectors strip emails/API keys from raw and stash hashes inside metadata.pii_flags. 4. Exports are JSON bundles signed with space + hash so teams can hand them to audit/compliance or restore elsewhere.

Deletion/export commands are CLI-level operations so teams can satisfy GDPR/CCPA requests without editing Lexon files manually.

4. API surface (Lexon primitives)

Determinism hooks: - freeze_clock overrides timestamps in bundles for goldens. - {"reset": true} ensures predictable states in tests.

5. Backend designs

5.1 Basic backend

5.2 Patricia backend

5.3 RAPTOR backend

5.4 Hybrid (GraphRAG/MemTree) backend

5.5 Backend selection

5.6 Pinning semantics

5.7 Backend guarantees at a glance

All backends share the same persistence primitive (MemorySpaceFile). Switching only changes ordering/scoring; the serialized data stays identical.

6. Bundled context format

recall_context returns:

{
  "space": "lexon_demo",
  "topic": "runtime",
  "generated_at": "2025-01-01T00:00:00Z",
  "global_summary": "Context bundle for 'runtime' (N items): ...",
  "sections": [
    {
      "id": "...",
      "path": "...",
      "kind": "...",
      "summary_micro": "...",
      "summary_short": "...",
      "summary_long": "...",
      "relevance": "...",
      "pinned": true,
      "tags": [...],
      "metadata": {...},
      "updated_at": "..."
    }
  ],
  "raw": [
    {"path": "...", "raw": "...", "kind": "..."}
  ],
  "limit": 2
}

Ordering rules: - Pinned + high relevance first. - Respect limit, raw_limit. - require_high_relevance filters out lower scores. - prefer_kinds and prefer_tags boost relevant memories.

6.1 Ingest → index → query flow

watcher (git diff / filesystem) --> enqueue(raw_asset)
  -> normalizer (assign path/kind/tags)
  -> remember_raw / remember_structured
      -> dedupe by metadata.hash
      -> persist to MemorySpaceFile (JSON)
      -> backend caches ordering view
recall_context/topic --> backend.order_for_topic --> bundle --> before_action hook --> agents/prompts

Pseudo-API for deterministic ingest:

let files = filesystem.watch("docs/runtime/*.md");
for file in files {
  let raw = read_file(file.path);
  remember_raw("runtime", "guide", raw, strings.json({
    path_hint: file.path,
    project: "lexon",
    tags: ["runtime", file.name]
  }));
}

Chunk metadata (chunks[]) is optional, but when present it lets downstream RAG pipelines correlate structured memory with vector search hits.

7. Governance & observability

8. Error handling

9. End-to-end example (repo delta QA)

Goal: index today’s repo changes and answer “What changed in telemetry?”

  1. Reset + ingest recent diffs:

    memory_space.create("lexon_repo", """{"reset": true}""");
    git diff --name-only HEAD~1 | grep '.rs$' | while read file; do \
      lexc remember_raw --space lexon_repo --kind code --path "$file" "$file"; \
    done
  2. Recall telemetry:

    let bundle = recall_context("lexon_repo", "telemetry", """{"limit":3,"include_raw":true}""");
    print(bundle.global_summary);
  3. Feed bundle into agents/RAG:

    before_action use_context project="lexon_repo", topic="telemetry";
    let analyst = agent_create("lexon_repo_analyst", """{"model":"openai:gpt-4o-mini"}""");
    let report = agent_run(analyst, "Describe telemetry changes today", """{"deadline_ms":15000}""");
    print(report);
  4. Export or prune:

    lexc memory export lexon_repo --out exports/lexon_repo.json
    lexc memory gc lexon_repo --ttl-days 30

The flow exercises ingestion, ordering, recall, MCP hooks, and governance knobs end-to-end.

10. Roadmap (structured memory track)

See ROADMAP.md for cross-cutting roadmap items (DX, providers, IR optimizations, networking/stdlib, sockets, CI hardening, etc.).


Last updated: 2025-12-03
Files referenced: lexc/src/executor/structured_memory.rs, lexc/src/executor/structured_memory/backends/*.rs, samples/memory/structured_semantic.lx, golden/memory/structured_semantic.txt.