Toy EML-Attention — Iteration 0 (pure-JS demo)

Live forward-pass demonstrator for the Iteration 0 EML-Attention block. This runs a pure-JS mirror of the Rust ToyEmlAttention entirely in the browser — the real WASM build lands in 0.6.9. See the architecture page for details and the Colab notebook for a trainable Python reference that exports JSON directly loadable by the Rust EmlModel.

Configuration

d_model: 8d_k: 4seq_len: 4depth: 3seed: 42

Architecture summary

Submodel	Shape	Params
q_model	(4·8) → (4·4)	672
k_model	(4·8) → (4·4)	672
v_model	(4·8) → (4·4)	672
softmax_model	4 → 4	56
out_model	(4·4) → (4·8)	832
total	—	2904

Forward-pass performance (pure JS, 64 iterations)

mean: 75.38 µs · p99: 785.77 µs

Rust p99 target: ≤ 5 µs at (seq_len=4, d_model=8, d_k=4, depth=3). JS is slower than the Rust build because of boxed-number overhead and the lack of SIMD — treat these numbers as an upper bound.

Attention matrix (softmax output)

Rows sum to 1. Untrained projections → near-uniform attention pattern. After training, expect the diagonal or a sparser pattern, depending on the task.

row 0: [0.256, 0.260, 0.248, 0.237]
row 1: [0.257, 0.260, 0.247, 0.235]
row 2: [0.257, 0.259, 0.248, 0.236]
row 3: [0.257, 0.260, 0.247, 0.236]

Output (last forward pass)

First 16 dims of the 4·8-wide output.

[0.7180, 0.6860, 0.6080, 0.7003, 0.6669, 0.7136, 0.7378, 0.5298, 0.6021, 0.7797, 0.6074, 0.6577, 0.7270, 0.6314, 0.6376, 0.7434, …]

Next steps

0.6.9 will swap this JS port for the real Rust/WASM build, same interface.
Training (gradient-free coordinate descent) runs in Rust via ToyEmlAttention::train; this page demonstrates forward pass only.
See the Python Colab notebook to train offline and export JSON for the Rust loader.