Reading time: 10 min Prerequisites: Know what a save file is uwu Survival rate: 100% (the goose is friendly)
The Problem (Why You Should Care)
Imagine you’re playing an RPG and the game tracks EVERY input you’ve ever made.
TRANSFORMER = that annoying save system:
- saves EVERY button press
- save file grows FOREVER
- 1M inputs = your SSD dies
- checking old saves = slower and slower
RWKV = normal save file:
- fixed size notebook
- overwrites old stuff
- 1M inputs = same size as 1
- constant speed forever
but v7 has something special…
The Video Game Analogy
v6 = Notebook with Pen Only
YOUR INVENTORY NOTEBOOK (v6):
┌─────────────────────────┐
│ sword [fade...fade] │ ← old notes fade
│ potion [fading...] │
│ NEW: helmet │ ← can only ADD
└─────────────────────────┘
Problem: wrote "10 potions" but you used 3
Can't fix it! It just fades away eventually.
v7 = Notebook with Pen AND Eraser
YOUR INVENTORY NOTEBOOK (v7):
┌─────────────────────────┐
│ sword [fade...fade] │ ← old notes fade
│ potion: 10 → 7 │ ← CAN EDIT!
│ NEW: helmet │ ← can add
└─────────────────────────┘
The eraser = "delta rule"
You can CORRECT mistakes, not just wait for them to fade!
Even Simpler: Git Analogy
v6 = append-only log
- git add, git add, git add
- can't amend
- old stuff just gets buried
v7 = full git
- git add
- git commit --amend ← THIS IS THE DELTA RULE
- git add
v7 can REWRITE history (safely)
The Delta Rule = Game Trainer
you know those game trainers that modify memory?
NORMAL GAME:
- state = whatever happened
GAME WITH TRAINER:
- state = whatever happened
- trainer: "wait HP is wrong, fixing..."
- state = corrected value
RWKV-7:
- memory = fade old stuff + ADD new stuff
- delta rule: "wait that's wrong, fixing..."
- memory = corrected value
the model is basically running cheat engine on itself during inference uwu
The Formula (Don’t Panic)
OLD (v6):
notebook = fade × old_page + new_note
NEW (v7):
notebook = fade × old_page + ERASER(old_page) + new_note
↑ ↑ ↑
keep some fix mistakes add new
that middle part (ERASER) is doing mini training while running.
like if your game learned your playstyle mid-session and adjusted.
Why O(n) vs O(n²) Matters
Transformer = Looking at EVERYONE’s inventory
Party of 4: check 4 × 4 = 16 comparisons
Party of 100: check 100 × 100 = 10,000 comparisons
Party of 1M: check 1M × 1M = 💀💀💀
"Hey does anyone have a potion?"
*checks literally everyone, every time*
RWKV = Just Check Your Own Notebook
Party of 4: check 1 notebook = 1 lookup
Party of 100: check 1 notebook = 1 lookup
Party of 1M: check 1 notebook = 1 lookup
"Do I have a potion?"
*checks personal notebook*
"yeah it says 7 here"
Real Numbers That Matter
TESTED ON GPU:
- Model: RWKV-7 2.9B "Goose"
- Speed: 48 tokens/sec
- Memory: fixed state size
- Context: infinite (theoretically)
TRANSFORMER EQUIVALENT:
- would need 48GB+ for same context
- slows down with longer input
- hard limit on context
The State = Your .git Folder
you don't see .git but it's THERE
tracking everything
fixed size (mostly)
contains full history (compressed)
RWKV state = same concept
- hidden tensor [B, H, K, V]
- fixed size always
- contains "compressed history"
- you don't see it but model uses it
v_first = Cross-Save Data
some games let later save files access early game data:
LAYER 0 (early game):
v_first = "remembered the tutorial sword"
LAYER 15 (late game):
v = current_value + blend(v_first, current)
"hey remember that tutorial sword?
it's relevant again for this boss"
this is why v7 can track things across LONG contexts
The Implementation
// the actual loop is just:
for t in 0..seq_len {
// 1. Fade (things get less important over time)
state = decay * state;
// 2. Delta rule (FIX mistakes)
state = state + correction_term; // ← THE MAGIC
// 3. Add new (write new notes)
state = state + new_key_value;
// 4. Read (what do we know?)
output[t] = read_from(state);
}
that’s it. that’s the whole thing uwu
The Decay Formula
DECAY = how fast old stuff fades
too fast (0.1): forgets EVERYTHING
too slow (0.99): remembers GARBAGE
v7 sweet spot: 0.545 to 1.0
- bounded decay
- can't explode or vanish
- goldilocks zone
like a game that auto-clears old quest markers but not too aggressively
TL;DR
| Thing | Analogy |
|---|---|
| State | Your notebook / .git folder |
| Decay | Old notes fading |
| Delta rule | Eraser / git amend |
| v_first | Cross-save data |
| O(n) | Just check your notebook |
| O(n²) | Ask everyone at the party |
What We’re Releasing
RWKV-7 "Goose" for Candle (Rust):
- Full implementation
- GPU accelerated
- 48 tok/s tested
- Works with HuggingFace weights
Files:
- candle-transformers/src/models/rwkv_v7.rs
- candle-examples/examples/rwkv7/
Run it:
cargo run --example rwkv7 --release --features cuda -- \
--model /path/to/rwkv7 \
--prompt "The meaning of life is"
You Survived!
now you understand:
- state = notebook (fixed size, tracks everything)
- delta rule = eraser (can fix mistakes)
- O(n) = just check yourself (not everyone else)
- v_first = cross-save data (early info for later)
the goose flies in Rust uwu
rune.みんな