clawft

EML — Self-Learning Functions

How WeftOS uses the EML operator (exp(x) - ln(y)) for O(1) learned functions that replace hardcoded heuristics across the entire stack.

WeftOS replaces hardcoded heuristics with learned functions that improve during operation. Every threshold, scoring formula, and tuning parameter that was once a magic constant is now a trainable EML model that converges toward the optimal function for the domain it runs in.

Feature: ecc Source: crates/eml-core/ (standalone) + 17 domain-specific wrappers across 4 crates Phase: K3c

What is EML?

The EML operator is the continuous-mathematics analog of the NAND gate:

eml(x, y) = exp(x) - ln(y)

Combined with the constant 1, this single operator can reconstruct all elementary mathematical functions -- arithmetic, exponentials, logarithms, trigonometry, roots. Any computable expression forms a binary tree under the grammar S -> 1 | eml(S, S).

Reference: Odrzywolel, A. "All elementary functions from a single operator." arXiv:2603.21852v2 [cs.SC], April 2026. Jagiellonian University, Krakow.

Key Properties

  • Weight snapping: During training, softmax-constrained weights at level-0 nodes push toward exact 1 values. This means trained models are interpretable -- you can read off the closed-form expression the model has learned.
  • Composability: Any EML tree can be nested inside another. A depth-3 tree is just 7 EML nodes arranged hierarchically.
  • Gradient-free training: Coordinate descent with random restarts works well for the modest parameter counts (30-80 params), avoiding the need for automatic differentiation.
  • Deterministic evaluation: No randomness at inference time. Same inputs always produce same outputs.

The eml-core Crate

eml-core is a standalone, zero-dependency crate (only serde for serialization). It provides the generic EML machinery that all domain-specific models build on.

Creating a Model

use eml_core::EmlModel;

// Create a depth-4 model with 3 inputs and 1 output head
let mut model = EmlModel::new(4, 3, 1);

// Record training data
for i in 0..100 {
    let x = [i as f64 / 100.0, i as f64 / 50.0, i as f64 / 200.0];
    let y = x[0] + x[1] + x[2];
    model.record(&x, &[Some(y)]);
}

// Train
let converged = model.train();

// Predict
let prediction = model.predict_primary(&[0.5, 1.0, 0.25]);

Architecture

Level 0: 8 affine combinations of input features (24 params)
  a_i = softmax(alpha, beta, gamma) . (1, x_j, x_k)

Level 1: 4 EML nodes (no params -- pure EML pairing)
  b_0 = eml(a_0, a_1), b_1 = eml(a_2, a_3), ...

Level 2: mixing + EML (depth-dependent params)

Level D: multi-head output (2 params per head)

Supported depths: 2, 3, 4, 5. Each depth adds more representational capacity at the cost of more parameters and slightly slower inference.

API

MethodDescription
EmlModel::new(depth, inputs, heads)Create untrained model
model.record(&inputs, &targets)Add training sample
model.train() -> boolTrain via coordinate descent; returns convergence
model.predict(&inputs) -> Vec<f64>Multi-head prediction
model.predict_primary(&inputs) -> f64First-head prediction
model.is_trained() -> boolCheck training status
model.training_sample_count() -> usizeNumber of recorded samples
model.mean_error() -> f64Current MSE
model.drain_events() -> Vec<EmlEvent>Drain lifecycle events for ExoChain
model.to_json() / from_json()Persistence

FeatureVector Trait

Types that implement FeatureVector can be passed directly to EML models without manual conversion:

pub trait FeatureVector {
    fn as_features(&self) -> &[f64];
}

Where EML is Used

17 EML models span 4 crates, replacing hardcoded heuristics across the entire WeftOS stack:

Kernel: Coherence (eml_coherence.rs)

ModelDepthInputsOutputsWhat It LearnsReplaces
EmlCoherenceModel (full)47 graph features3 (lambda_2, fiedler_norm, uncertainty)Algebraic connectivity predictionO(k*m) Lanczos iteration
EmlCoherenceModel (fast)37 graph features1 (lambda_2)Quick coherence checkO(k*m) Lanczos iteration

Kernel: Governance and Operations (eml_kernel.rs)

ModelDepthInputsOutputsWhat It LearnsReplaces
GovernanceScorerModel35 EffectVector dims1 composite scoreDimension importance weightingL2 norm
RestartStrategyModel24 failure features2 (delay, should_retry)Optimal restart delayFixed backoff
HealthThresholdModel23 health features2 (degraded, failed)Adaptive probe thresholdsFixed thresholds
DeadLetterModel23 retry features2 (delay, should_discard)Smart retry policyFixed retry policy
GossipTimingModel23 network features1 intervalNetwork-adaptive gossip intervalFixed gossip interval
ComplexityModel23 code features1 thresholdContext-sensitive complexity limits500-line threshold

Kernel: HNSW Search Optimization (hnsw_eml.rs)

ModelDepthInputsOutputsWhat It LearnsReplaces
DistanceModel34 selected dims1 distanceDomain-specific dimension selectionFull cosine similarity
AdaptiveEfModel34 query features1 beam widthPer-query optimal beam widthFixed ef=100
PathModel34 query features1 entry pointSearch entry-point predictionRandom entry
RebuildModel34 recall features1 rebuild signalWhen to rebuild HNSW graphFixed schedule

Kernel: Causal Prediction (causal_predict.rs)

ModelDepthInputsOutputsWhat It LearnsReplaces
CausalCollapseModel39 edge features1 correctionHigher-order correction to delta_lambda_2Analytical-only prediction

Graphify: Knowledge Graph (eml_models.rs)

ModelDepthInputsOutputsWhat It LearnsReplaces
SurpriseScorerModel37 edge features1 surprise scoreNon-linear surprise factorsLinear weighted scoring
ClusterThresholdModel23 topology features3 thresholdsOptimal community detection paramsFixed constants
LayoutModel33 graph features6 physics paramsForceAtlas2 physics tuningHardcoded physics
ForensicCoherenceModel34 graph stats1 coherenceDomain-specific coherencedensity * avg_confidence
QueryFusionModel34 scoring dims1 relevanceHybrid keyword+graph+community+type scoringLinear weighted sum

EML Distillation (KG-017)

Model distillation compresses a trained depth-4 EML model into a depth-2 model with minimal accuracy loss. The distillation process generates synthetic training data by evaluating the teacher model across the input space, then trains the student model on those predictions. This is useful for deploying EML models on resource-constrained edge devices (T0/T1 kernel profiles) where inference latency must be minimized.

use eml_core::distill;

let teacher = EmlModel::from_json(&saved_depth4)?;
let student = distill(&teacher, 2, 1000);  // depth-2 student, 1000 synthetic samples
// student.predict() runs ~3x faster than teacher

Typical accuracy retention: 95-98% of teacher MSE for well-conditioned models. The distilled model preserves the two-tier pattern -- it just runs the fast path even faster.

The Two-Tier Pattern

Every EML model in WeftOS follows the same two-tier pattern:

+-----------------------+       +-----------------------+
|  FAST PATH (every op) |       |  GROUND TRUTH (periodic)|
|                       |       |                       |
|  EML prediction       |       |  Exact computation    |
|  ~100ns               |       |  ~500us               |
|                       |       |                       |
|  if drift > threshold ------->|  result feeds back    |
|                       |       |  into training buffer |
+-----------------------+       +-----------+-----------+
                                            |
                                            v
                                +-----------+-----------+
                                |  RETRAIN (every N)    |
                                |                       |
                                |  Coordinate descent   |
                                |  ~1ms for 34 params   |
                                +-----------------------+
  1. Every tick/operation: The EML model provides an O(1) prediction (~100ns). If the result is within acceptable bounds, no further work is needed.

  2. On drift detection: When the fast prediction diverges from expected behavior beyond a threshold, the system falls back to the exact (expensive) computation. The exact result is recorded as a training sample.

  3. Periodic retraining: After enough exact samples accumulate (typically 50+), model.train() refines parameters via random restart + coordinate descent.

This pattern is self-improving: models train on the actual data the system processes during operation. As the system encounters more cases from its operational domain, predictions become increasingly accurate. No manual tuning is required.

Causal Collapse Prediction

The causal collapse prediction module (causal_predict.rs) is one of the most impactful applications of EML. It predicts how adding a new edge will change the causal graph's algebraic connectivity (lambda_2) without recomputing the expensive eigenvalue decomposition.

The Core Formula

First-order eigenvalue perturbation theory gives:

delta_lambda_2 = w * (phi[u] - phi[v])^2

where phi is the Fiedler vector and w is the edge weight. Edges that bridge the spectral partition (phi[u] and phi[v] have opposite signs) produce the largest coherence gains.

rank_evidence_by_impact()

Ranks candidate edges by their predicted coherence impact without actually adding any edges:

let rankings = rank_evidence_by_impact(&graph, &fiedler, &candidates);
// rankings sorted by predicted_delta descending (biggest impact first)
for r in &rankings {
    println!("{}: {} -> {}, delta={:.4}, {}",
        r.weight, r.source, r.target, r.predicted_delta, r.explanation);
}

EML-Enhanced Prediction

The CausalCollapseModel adds a learned correction term to the analytical formula:

predicted = analytical_delta + eml_correction(9 features)

The 9 input features are: fiedler_u, fiedler_v, edge_weight, current_lambda2, spectral_gap, graph_density, node_count, degree_u, degree_v. The EML tree learns the higher-order corrections that the first-order perturbation formula misses.

Applications

  • Cold case analysis: Identify which evidence, if discovered, would most strengthen the causal model
  • Robotics: Predict which sensor readings would most improve the world model before acquiring them
  • Conversation: detect_conversation_cycle() identifies stuck/oscillating conversations by monitoring lambda_2 stagnation

ExoChain Integration

EML lifecycle events are chain-witnessed through the ExoChain audit trail. Each EmlModel accumulates events internally; the kernel drains and appends them to the chain.

EmlEvent Types

pub enum EmlEvent {
    Trained { model_name, samples_used, mse_before, mse_after, converged, param_count },
    Prediction { model_name, inputs_hash, output },
    Drift { model_name, predicted, actual, drift_pct },
    Saved { model_name, path, param_count },
    Loaded { model_name, path, param_count },
}

Every training event, significant prediction, drift detection, and persistence operation is chain-logged with full provenance. This means you can audit:

  • When a model was trained and whether it converged
  • What the MSE was before and after training
  • When drift was detected and by how much
  • When model state was saved/loaded and from where

Persistence

Trained model parameters persist to .weftos/eml-models/ as JSON files. Models are automatically saved after successful training and loaded during kernel boot. The ExoChain records both save and load events.

Performance

EML inference is dominated by exp() and ln() calls at each tree node. Benchmark results on aarch64:

DepthParametersInference TimePer-Output
2~20~80ns~80ns
3~34~149ns~91ns (3-head)
4~52~272ns~272ns
5~80~450ns~450ns

For comparison:

  • Lanczos eigenvalue iteration: ~500us (O(k*m))
  • Full cosine similarity: ~2us (O(d))
  • EML coherence prediction: ~149ns (O(1))

The 5000x speedup over Lanczos enables coherence checking at the 10,000 Hz ECC tick rate required for robotics workloads, with three orders of magnitude of headroom.

Configuration

All EML models are fully automatic -- no manual configuration is required.

  • Models initialize untrained and use hardcoded fallbacks until enough data accumulates
  • Training happens in-band when the caller invokes model.train() after 50+ samples
  • Trained models persist to .weftos/eml-models/ and reload at boot
  • Convergence criterion: MSE < 0.01 over the training set
  • Training uses 100 random restarts followed by coordinate descent with 6 step sizes

There are no knobs to turn. The system learns the right function from operational data.

See Also

On this page