Design Decisions
A deep dive into the decisions behind Mantle's rating system, from Glicko-2 engine selection to climbing-specific adaptations.
Design Philosophy
Most climbing apps treat logging as the end goal. We treat it as the starting point. Every feature in Mantle is derived from one simple action: logging a climb in under five seconds.
We made a few foundational decisions early that shaped everything else. First, ratings update per-session, not per-climb. A session is a natural unit in climbing - you warm up, push your limits, and cool down. Evaluating performance across the whole session produces more stable, meaningful ratings than reacting to individual climbs.
Second, bouldering and sport climbing are fully independent tracks. They use different muscles, different skills, and different grade scales. A V7 boulderer might climb 5.11a on ropes - or 5.13a. Combining them into one number would be meaningless.
Third, the system only rewards real climbing behavior. There are no shortcuts to inflate your rating. Repeating the same easy grade gives diminishing returns. Top-roping gets a discount versus leading. Every ascent type, from onsight to hangdog, has a carefully calibrated weight.
Why Glicko-2
Choosing the right rating engine was the most important technical decision we made. We didn't just pick one. We built two complete engines and ran 1,188 simulations to compare them.
We simulated climber profiles ranging from beginners to elite athletes, each with realistic session patterns, strengths, and weaknesses. Both Glicko-2 and an IRT/Rasch model were built and tested against these profiles.
The results were remarkably close - maximum divergence of just 16 rating points, with 100% agreement on rank assignments. But Glicko-2 won on three key factors:
| Feature | Elo | IRT / Rasch | Glicko-2 |
|---|---|---|---|
| Tracks confidence | ✕ | Partial | ✓ |
| Tracks volatility | ✕ | ✕ | ✓ |
| Confidence decays with inactivity | ✕ | ✕ | ✓ |
| Session-based periods | Partial | ✓ | ✓ |
| Well-established in competitive systems | ✓ | ✕ | ✓ |
Glicko-2's confidence measure naturally increases when a climber is inactive, which means the system becomes less certain about your rating over time if you stop climbing, and more responsive when you return. Volatility tracking identifies whether a climber performs consistently or erratically, further tuning how much each session moves their rating.
Fixed-Opponent Adaptation
Glicko-2 was designed for head-to-head matchups (chess, gaming). We adapted it by treating each climbing route as a fixed-rated opponent. A V5 boulder “plays” at 1400 rating points. When you send it, you beat that opponent. When you fall, you lose. The route's rating never changes - only yours does.
The 9-Stage Pipeline
Before any climb affects your rating, it passes through a 9-stage processing pipeline. Each stage addresses a specific climbing reality that a raw rating system wouldn't handle correctly.
Grade Mapping
The gap between climbing grades isn't linear. It's exponential. The jump from V0 to V1 might take a few weeks. The jump from V10 to V11 might take years. Our grade-to-rating mapping reflects this.
Each grade maps to a fixed rating value that serves as the route's “opponent rating.” The gaps between consecutive grades grow steadily, starting at 100 points and increasing by 20 points per grade.
V-Scale Rating Gaps
This exponential curve means the rating system naturally distinguishes between grade levels at every ability range. A V2 climber sending a V3 and a V9 climber sending a V10 both represent meaningful jumps, but the V10 send moves the rating more because the gap is genuinely larger.
Four Grade Scales, One Source of Truth
Mantle supports V-scale, YDS, Font, and French grades. All grade-to-rating mappings are centrally managed, so every part of Mantle stays in sync. This means we can adjust grade mappings without shipping app updates.
Ascent Scoring
Not all sends are equal. Onsighting a route on your first look is harder than sending it after ten attempts with beta from a friend. Our scoring system captures these nuances.
Fractional Type Scores
We chose a fractional scoring system over binary (send/no-send) to differentiate between ascent styles. Each type gets a carefully calibrated multiplier:
Context-Aware Types
The app shows different ascent types depending on discipline and protection. Bouldering offers onsight, flash, send, repeat, and attempt. Lead climbing adds redpoint and pinkpoint. Top-rope uses clean instead of redpoint. This prevents invalid combinations and keeps logging fast.
Why Diminishing Returns Matter
Without diminishing returns, a climber could send twenty V3s in a session and gain the same rating boost as someone who sent a V7. The linear falloff (80%, 60%, 40% for repeated grades) rewards pushing into harder territory rather than farming easy sends.
The Impact Curve
The impact weighting is what makes the whole system feel fair. A 1400-rated climber sending a V5 (1400 points) gets moderate impact. Sending a V7 (1820 points) gets high impact. They're punching above their weight. But sending a V2 (920 points) has almost zero impact. They're not proving anything new.
The Rank System
Raw numbers are precise but impersonal. Ranks give climbers an identity, a tier that reflects where they are on their climbing journey. The seven tier names are drawn from climbing culture.
Prospect is just starting out. Scrambler is finding their feet. Sender is consistent. Crusher is strong. Projector is working hard problems. Dirtbag is living the climbing life. Stonemasters is elite, named after the legendary Yosemite climbing crew of the 1970s.
| Rank | Boulder | Sport |
|---|---|---|
| Prospect | VB - V1 | 5.5 - 5.7 |
| Scrambler | V1 - V3 | 5.8 - 5.9 |
| Sender | V3 - V5 | 5.10a - 5.11a |
| Crusher | V5 - V7 | 5.11a - 5.11d |
| Projector | V7 - V9 | 5.11d - 5.12c |
| Dirtbag | V9 - V11 | 5.12c - 5.13a |
| Stonemasters | V11+ | 5.13a+ |
The boundaries are asymmetric between disciplines because the grade scales don't map one-to-one. A V5 boulderer and a 5.11a sport climber are at roughly the same tier, but the exact rating numbers differ.
Safety Cap
No session can change your rating by more than 500 points. This prevents a single outlier session, whether exceptionally good or bad, from wildly distorting your rank. If the cap triggers, the system adjusts proportionally to stay internally consistent.
Integrity & Trust
A rating system is only as good as the data behind it. If anyone can fabricate climbs, ratings become meaningless. We built multiple layers of verification without making the logging experience slower.