clawft

ECC Cognitive Substrate

Ephemeral Causal Cognition: causal DAG, cognitive tick, cross-references, HNSW vector search, impulse queue, calibration, and the three operating modes.

The Ephemeral Causal Cognition (ECC) substrate is the cognitive layer of WeftOS. It implements a distributed nervous system where every kernel instance is a node -- from embedded sensor devices to GPU servers -- all running the same Kernel<P: Platform>, differentiated by boot-time calibration.

Feature: ecc Source: 6 modules in crates/clawft-kernel/src/ (83 tests) Phase: K3c Quantum acceleration (experimental, 0.6.7+): ECC's graph-spectral operations can be offloaded onto real neutral-atom quantum processors via the Quantum Cognitive Layer. The classical path remains the default; quantum is a compile-time opt-in.

What is ECC

ECC provides the kernel with causal reasoning, semantic search, adaptive processing, and cross-structure linking. It is not an optional analytics add-on -- it is the foundation for WeftOS as a cognitive platform where agents reason about cause and effect, detect novelty, and maintain coherent world models.

The ECC substrate implements a forest of trees architecture (Symposium D2): multiple domain-specific structures (ExoChain, Resource Tree, HNSW Index, Causal Graph) linked by CrossRefs and Impulses. Each structure uses data structures appropriate to its domain.

Three Operating Modes

The same cognitive engine operates in three modes that compose into a continuous loop:

Mode 1: Act (Real-Time)

Agents produce actions in real-time. The engines process each contribution within the cognitive tick: causal graph updates, belief tracking, coherence scoring, and affect modulation. The ExoChain records every committed action with full provenance.

Mode 2: Analyze (Post-Hoc)

Given an existing corpus (transcripts, PR history, sprint commits), run the engines in read-only mode to reconstruct contribution trees, infer goals, map coherence, and identify gaps.

Mode 3: Generate (Goal-Directed)

Set a goal, spawn expert agent-processes, and let them converse toward the desired output. The conversation IS the development process, with full causal provenance.

Composition

GENERATE --> ANALYZE --> ACT --> GENERATE --> ...

Each transition is a causal edge. Training material falls out naturally -- every scored witness entry is a training sample.

CausalGraph

Source: crates/clawft-kernel/src/causal.rs (~700 lines, 22 tests)

A concurrent, lock-free DAG where nodes represent events/observations and edges encode causal relationships with weights and provenance.

pub type NodeId = u64;  // Local to causal module

pub enum CausalEdgeType {
    Causes,        // A directly causes B
    Inhibits,      // A suppresses B
    Correlates,    // Statistical correlation
    Enables,       // A is a precondition for B
    Follows,       // A temporally follows B
    Contradicts,   // A provides evidence against B
    TriggeredBy,   // Created by a ClawStage trigger
    EvidenceFor,   // A supports B
}

pub struct CausalEdge {
    pub source: NodeId,
    pub target: NodeId,
    pub edge_type: CausalEdgeType,
    pub weight: f64,
    pub provenance: String,
}

API: add_node, get_node, remove_node, link, unlink, get_forward_edges, get_reverse_edges, traverse_forward, traverse_reverse, find_path

Built on DashMap for safe concurrent access from multiple agent threads. BFS traversal for path finding.

CognitiveTick

Source: crates/clawft-kernel/src/cognitive_tick.rs (~550 lines, 20 tests)

The heartbeat of the ECC substrate. Drives cognitive processing at a configurable interval with adaptive adjustment.

pub struct CognitiveTickConfig {
    pub tick_interval_ms: u32,       // Default: 50
    pub tick_budget_ratio: f32,      // Default: 0.3 (30% of interval for compute)
    pub calibration_ticks: u32,      // Default: 100
    pub adaptive_tick: bool,         // Default: true
    pub adaptive_window_s: u32,      // Default: 30
}

pub struct CognitiveTickStats {
    pub tick_count: u64,
    pub current_interval_ms: u32,
    pub avg_compute_us: u64,
    pub max_compute_us: u64,
    pub drift_detected: bool,
}

Implements SystemService (name: ecc.cognitive_tick). The tick interval is self-calibrated at boot, auto-adjusted at runtime, and advertised to peers as a cluster capability (Symposium D3).

CrossRefStore

Source: crates/clawft-kernel/src/crossref.rs (~425 lines, 12 tests)

Universal cross-references between the forest structures.

pub enum StructureTag {
    ExoChain,       // 0x01
    ResourceTree,   // 0x02
    CausalGraph,    // 0x03
    HnswIndex,      // 0x04
    Custom(u8),     // 0x10+
}

pub struct UniversalNodeId {
    pub structure: StructureTag,
    pub local_id: String,
    pub hash: [u8; 32],  // BLAKE3
}

UniversalNodeId uses BLAKE3 hashing (Symposium D6: new ECC code uses BLAKE3; existing ExoChain keeps SHAKE-256 until K6 migration).

pub enum CrossRefType {
    DerivedFrom,
    References,
    Contradicts,
    Supersedes,
    Custom(String),
}

API: insert, get_forward, get_reverse, get_all, by_type

Concurrent forward/reverse index enables grafting (linking) and shaking (pruning) across any pair of structures.

HnswService

Source: crates/clawft-kernel/src/hnsw_service.rs (~280 lines, 11 tests)

Thread-safe kernel wrapper around clawft_core::embeddings::hnsw_store::HnswStore.

pub struct HnswServiceConfig {
    pub ef_search: usize,           // Default: 100
    pub ef_construction: usize,     // Default: 200
    pub default_dimensions: usize,  // Default: 384
}

pub struct HnswSearchResult {
    pub id: String,
    pub score: f64,  // Cosine similarity
}

Implements SystemService (name: ecc.hnsw). API: insert, search, len, clear, insert_count, search_count.

ImpulseQueue

Source: crates/clawft-kernel/src/impulse.rs (~320 lines, 8 tests)

Ephemeral causal events that flow between the four ECC structures.

pub enum ImpulseType {
    BeliefUpdate,      // causal -> hnsw (new embedding needed)
    CoherenceAlert,    // spectral -> causal (graph incoherent)
    NoveltyDetected,   // hnsw -> causal (new cluster found)
    EdgeConfirmed,     // cloud -> edge (validated)
    EmbeddingRefined,  // cloud -> edge (better embedding)
    Custom(u8),
}

API: emit, drain_ready, pending_count, clear

HLC-sorted for causal ordering. Structure tags are raw u8 values matching StructureTag::as_u8().

EccCalibration

Source: crates/clawft-kernel/src/calibration.rs (~410 lines, 10 tests)

Boot-time benchmarking that determines hardware capability:

pub struct EccCalibrationConfig {
    pub calibration_ticks: u32,       // Default: 30
    pub tick_interval_ms: u32,        // Default: 50
    pub tick_budget_ratio: f32,       // Default: 0.3
    pub vector_dimensions: usize,     // Default: 384
}

Measures HNSW insert/search latency, causal edge creation latency, and BLAKE3 hashing speed. Returns p50/p95 timings that auto-tune the cognitive tick interval.

One feature flag, boot decides (Symposium D8): compile-time --features ecc includes all modules; boot-time calibration determines what is active.

Cluster Advertisement

NodeEccCapability in cluster.rs advertises ECC capabilities to peers:

pub struct NodeEccCapability {
    pub tick_interval_ms: u32,
    pub hnsw_dimensions: usize,
    pub causal_node_count: u64,
}

This enables heterogeneous swarms where a glasses node (50ms tick) and a server node (10ms tick) participate in the same nervous system.

Vector Search Backends

The ECC substrate supports pluggable vector search via the VectorBackend trait. Three backends ship with WeftOS:

HNSW (default)

In-memory approximate nearest neighbor search via instant-distance. Best for datasets under 1 million vectors where low-latency real-time search is required.

[kernel.vector]
backend = "hnsw"
dimensions = 384
ef_search = 100
ef_construction = 200

DiskANN (SSD-backed)

SSD-backed approximate nearest neighbor search via ruvector-diskann v2.1. Uses a Vamana graph with optional product quantization and mmap persistence. Best for large corpora (1M+ vectors) where memory is constrained.

[kernel.vector]
backend = "diskann"
dimensions = 384

[kernel.vector.diskann]
max_points = 10_000_000
num_neighbors = 64
search_list_size = 128
data_path = ".weftos/diskann"
use_pq = true
pq_num_chunks = 32

Requires the diskann feature flag: scripts/build.sh native --features diskann

Without the feature flag, a brute-force stub provides the same API with linear scans (useful for development and testing).

Hybrid (HNSW + DiskANN)

Hot HNSW cache in front of a cold DiskANN store. Frequently accessed vectors are promoted to the in-memory HNSW layer; cold vectors are evicted back to DiskANN via LRU. Best when you need both real-time latency for hot data and large-corpus capacity.

[kernel.vector]
backend = "hybrid"
dimensions = 384

[kernel.vector.hnsw]
ef_search = 100
ef_construction = 200

[kernel.vector.diskann]
max_points = 10_000_000
data_path = ".weftos/diskann"

[kernel.vector.hybrid]
hot_capacity = 100_000
promotion_threshold = 3
eviction_batch_size = 1000

VectorBackend Trait

All backends implement the VectorBackend trait:

pub trait VectorBackend: Send + Sync {
    fn insert(&self, id: &str, vector: &[f32]) -> VectorResult<()>;
    fn search(&self, query: &[f32], k: usize) -> VectorResult<Vec<SearchResult>>;
    fn remove(&self, id: &str) -> bool;
    fn flush(&self) -> VectorResult<()>;
    fn len(&self) -> usize;
    fn backend_name(&self) -> &str;
}

Hybrid Promotion/Eviction

In hybrid mode, the backend tracks an access counter per vector. When a vector stored in the cold DiskANN layer is accessed more than promotion_threshold times, it is copied into the hot HNSW cache. When the hot cache exceeds hot_capacity, the least-recently-used vectors are evicted in batches of eviction_batch_size back to the cold layer. Search queries both layers and merge-deduplicate results by score.

When to Use Each Backend

BackendBest ForDataset SizeLatency
HNSWReal-time search, small-medium corpora< 1M vectorsSub-millisecond
DiskANNLarge corpora, memory-constrained hosts1M+ vectorsLow milliseconds
HybridMixed workloads with hot/cold accessAny sizeSub-ms (hot), low-ms (cold)

Sprint 17: ECC Enhancements

RFF Spectral Analysis (KG-004)

Random Fourier Feature (RFF) spectral analysis provides an O(m) approximation to the full eigenvalue decomposition. Instead of computing exact eigenvalues via Lanczos iteration, RFF projects the graph Laplacian into a random feature space and estimates spectral properties from the projection. This is faster than Lanczos for large graphs while providing sufficient accuracy for coherence monitoring and community detection.

Geometric Shadowing (KG-009)

Age-aware edge weight decay with per-edge volatility tracking. Edges that have not been reinforced by new evidence decay geometrically over time, with the decay rate proportional to the edge's historical volatility. High-volatility edges (frequently created and removed) decay faster than stable edges. This prevents stale causal relationships from dominating the graph.

// Decay is applied during each cognitive tick
let decayed_weight = original_weight * base_decay.powf(age * volatility);

VQ Codebook Cold-Start (KG-014)

Vector Quantization codebook initialization using K-means++ for new entities entering the HNSW index. When a new domain or entity type is encountered without pre-existing embeddings, the VQ codebook bootstraps initial cluster centroids from the first batch of entities, providing reasonable search quality before the full embedding model has accumulated enough training data.

Info Gain Pruning (KG-005)

Information-gain-based pruning removes redundant evidence edges that contribute minimal new information to the causal graph. Each edge is scored by its marginal information gain relative to existing paths between the same endpoints. Edges below the pruning threshold are candidates for removal, reducing graph density without losing meaningful causal structure.

Causal Chain Tracing (KG-003)

Typed BFS traversal that follows specific edge types through the causal graph, producing natural-language explanations of causal chains. Given a source event and target outcome, it enumerates all causal paths up to a configurable depth, filtering by edge type (Causes, Enables, TriggeredBy) and producing human-readable explanations of each chain.

weaver ecc trace --from event_123 --to outcome_456 --depth 5 --types causes,enables

Resource Tree Namespaces

6 namespaces under /kernel/services/ecc/:

  • /kernel/services/ecc/causal
  • /kernel/services/ecc/hnsw
  • /kernel/services/ecc/crossref
  • /kernel/services/ecc/impulse
  • /kernel/services/ecc/tick
  • /kernel/services/ecc/calibration

CLI Commands

weaver ecc status          # Show ECC subsystem status
weaver ecc calibrate       # Run calibration benchmark
weaver ecc search <query>  # HNSW vector search
weaver ecc causal          # Display causal graph info
weaver ecc crossrefs       # List cross-references
weaver ecc tick            # Show cognitive tick stats

Built-in Tools

7 ECC tools in the tool catalog (behind ecc feature):

ToolDescription
ecc.statusECC subsystem status
ecc.calibrateRun calibration
ecc.searchHNSW search
ecc.causal.addAdd causal node
ecc.causal.linkCreate causal edge
ecc.crossref.addAdd cross-reference
ecc.impulse.emitEmit an impulse

WeaverEngine

Source: crates/clawft-kernel/src/weaver.rs (~4,800 lines, 126 tests)

The WeaverEngine is a SystemService that drives iterative causal modeling over data sources. It runs a HYPOTHESIZE-OBSERVE-EVALUATE-ADJUST loop within the cognitive tick, maintaining confidence scores and self-improvement tracking via a meta-Loom.

Capabilities

  • Modeling sessions: Start, stop, resume domain-specific modeling sessions
  • Data ingestion: Git history, file trees, CI pipelines, issue trackers, documentation, SPARC plans, and custom streams (7 source types)
  • Confidence evaluation: Edge coverage scoring, orphan detection, gap suggestions
  • Confidence history: Ring-buffer of snapshots with trend analysis (improving/stable/declining)
  • Strategy tracking: Records which analysis strategies improved confidence, recommends effective strategies
  • Model export/import: ExportedModel serialization with causal nodes/edges for edge device deployment
  • Model diff: diff_models() compares two exported models, producing ModelDiff with node/edge/confidence deltas
  • Model merge: merge_models() stitches models from different sessions, resolving conflicts by higher confidence
  • Meta-Loom: MetaLoomEvent and MetaDecisionType track the Weaver's own evolution
  • WeaverKnowledgeBase: Cross-domain pattern library with save_to_file()/load_from_file(), learn_pattern()/find_patterns() persistence
  • Tick interval recommendation: Analyzes change frequency to suggest optimal tick intervals
  • Git polling: Incremental commit detection via GitPoller
  • File watching: mtime-based change detection via FileWatcher

Embedding Providers

Source: crates/clawft-kernel/src/embedding.rs and embedding_onnx.rs (62 tests)

ProviderDimensionsUse Case
MockEmbeddingProviderconfigurableDeterministic SHA-256 hash vectors for testing
LlmEmbeddingProvider384 (default)LLM API embeddings with mock fallback
OnnxEmbeddingProvider384all-MiniLM-L6-v2 with hash fallback
SentenceTransformerProvider384Markdown-aware, sentence-splitting, mean pooling
AstEmbeddingProvider256Hybrid structural+semantic for Rust code

DEMOCRITUS Cognitive Loop

The DEMOCRITUS loop (democritus.rs) is the integration layer that runs on every cognitive tick: Sense → Embed → Search → Update → Commit. It drains the ImpulseQueue, embeds events via the configured provider, queries HNSW for correlated neighbors, updates the CausalGraph with inferred edges, and registers CrossRefs. See the dedicated DEMOCRITUS page for configuration, tuning, and API details.

Persistence

CausalGraph and HNSW state can be saved to and restored from disk via the persistence coordinator. See Persistence for file layout, API, and recovery behavior.

CLI Commands

weaver ecc session start --domain my-project --git .
weaver ecc source add --domain my-project --type ci_pipeline
weaver ecc confidence --domain my-project --verbose
weaver ecc export --domain my-project --output weave-model.json
weaver ecc stitch --source frontend --target backend
weaver ecc meta --domain my-project
weaver ecc meta export-kb --output weaver-kb.json

IPC Protocol

WeaverCommand (13 variants) and WeaverResponse for communication with CLI and agents via KernelMessage.

O(1) Coherence Approximation (EML)

Source: crates/clawft-kernel/src/eml_coherence.rs (16 tests) Phase: K3c See also: EML -- Self-Learning Functions for the full EML system covering all 16 models across the stack.

The EML coherence model replaces the expensive O(k*m) Lanczos eigenvalue iteration with a 34-parameter depth-3 master formula that predicts algebraic connectivity (lambda_2) from cheap graph statistics in O(1) time. This converts coherence checking from ~500us to ~100ns -- roughly a 5000x speedup -- enabling coherence evaluation at the 10,000 Hz tick rate required for robotics workloads.

The EML Operator

The model is built on the EML universal operator discovered by Odrzywolel (2026):

eml(x, y) = exp(x) - ln(y)

This is the continuous-mathematics analog of the NAND gate: combined with the constant 1, it can reconstruct all elementary mathematical functions (arithmetic, exponentials, logarithms, trigonometry, roots). Any expression forms a binary tree under the grammar S -> 1 | eml(S, S).

Reference: Odrzywolel, A. "All elementary functions from a single operator." arXiv:2603.21852v2 [cs.SC], April 2026. Jagiellonian University, Krakow.

Architecture: Depth-3 Master Formula

The model uses a depth-3 EML tree with softmax-constrained weights:

Level 0: 8 linear combinations of 7 input features (24 params)
  a_i = softmax(alpha, beta, gamma) . (1, x_j, x_k)

Level 1: 4 EML nodes
  b_0 = eml(a_0, a_1), b_1 = eml(a_2, a_3), ...

Level 2: 2 EML nodes with mixing (8 params)
  c_0 = eml(mix(b_0, b_1), mix(b_2, b_3))
  c_1 = eml(mix(b_0, b_1), mix(b_2, b_3))

Level 3: 1 EML node with mixing (2 params)
  result = eml(mix(c_0), mix(c_1))

Total: 34 trainable parameters. The softmax constraint forces alpha + beta + gamma = 1 at each level-0 node, which during training pushes weights toward exact 1 values -- recovering interpretable closed-form expressions.

Graph Features

The model predicts lambda_2 from 7 statistics extracted in O(n) time:

FeatureDescription
node_countNumber of nodes |V|
edge_countNumber of edges |E|
avg_degreeAverage degree: 2*|E| / |V|
max_degreeMaximum degree across all nodes
min_degreeMinimum degree across all nodes
densityEdge density: 2*|E| / (|V| * (|V|-1))
component_countNumber of connected components

Two-Tier Coherence Pattern

The EML model integrates into the DEMOCRITUS cognitive loop via a two-tier pattern:

  1. Every tick: CausalGraph::coherence_fast(&model) runs the O(1) EML prediction (~100ns). If the predicted coherence is within acceptable bounds, no further work is needed.

  2. On drift detection: When the fast prediction indicates coherence has drifted beyond the threshold, the system falls back to spectral_analysis() for an exact O(k*m) Lanczos computation (~500us). After computing the exact value, model.record(features, lambda_2) feeds the result into the training buffer.

  3. Periodic retraining: When 50+ exact samples have accumulated, model.train() refines the 34 parameters via random restart + coordinate descent. The model converges when MSE drops below 0.01.

This pattern is self-improving: the model trains on the actual causal graphs the system processes during operation. As the system encounters more graphs from its operational domain, predictions become increasingly accurate.

Training

Training uses a gradient-free optimization strategy suitable for 34 parameters:

  • Phase 1: 100 random restarts to find a good basin
  • Phase 2: Coordinate descent refinement with 6 step sizes (+-0.1, +-0.01, +-0.001)
  • Convergence: MSE < 0.01 over the training set
  • Minimum data: 50 recorded (features, lambda_2) pairs required before training begins

Convergence has been verified on 5 standard graph families: complete graphs (K_n), star graphs, cycle graphs, path graphs, and Erdos-Renyi random graphs.

API

// O(1) approximate coherence from EML model
let coherence = graph.coherence_fast(&model);

// Record an exact measurement for training
model.record(features, exact_lambda_2);

// Train when enough data has accumulated
let converged = model.train();

// Check model status
model.is_trained();
model.training_sample_count();
model.mean_error();

// Direct prediction from features
let lambda_2 = model.predict(&features);

Causal Collapse Prediction

The causal collapse module (crates/clawft-kernel/src/causal_predict.rs) extends the coherence system with predictive edge ranking. Using first-order eigenvalue perturbation theory:

delta_lambda_2 = w * (phi[u] - phi[v])^2

The CausalCollapseModel adds a learned EML correction term (depth-3, 9 inputs) to the analytical formula, learning higher-order effects from operational data. See the full Causal Collapse Prediction section for details on rank_evidence_by_impact() and conversation cycle detection.

Configuration

The EML coherence model is fully automatic -- no manual configuration is required. The model initializes untrained and uses a density-based fallback (density * avg_degree) until enough exact measurements have been collected. Training happens in-band when the caller invokes model.train() after sufficient data accumulation.

Why It Matters for Robotics

Real-time robotic systems require coherence checking at rates matching their control loops (1,000--10,000 Hz). The standard Lanczos iteration at ~500us per call limits coherence checking to ~2,000 Hz -- insufficient for high-frequency control. The EML approximation at ~100ns per call supports coherence checking at 10,000,000 Hz, providing three orders of magnitude of headroom above the 10,000 Hz target.

On this page