March 28, 2026 · 7 min read

How Chess Ratings Inspired
a Vocabulary App

ELO ratings gave chess a single number that means something. The same mathematics, it turns out, can do the same for your vocabulary — and for every word in the English language.

A Number That Earns Its Meaning

Arpad Elo was a Hungarian-American physics professor who loved chess. In the 1960s, frustrated by the crude ranking systems of his era, he devised something elegant: a formula that could assign every chess player a single number based on performance — and update that number after every game.

The idea was beautifully simple. Before each match, the system predicts an outcome based on the two players' ratings. If you beat someone rated higher than you, your rating rises more than it would for beating someone lower. Lose to a weaker player, and you fall further than you would for losing to a stronger one. Over time, the number converges on something real — not an opinion, not a title, but a calibrated measure of skill.

Magnus Carlsen's peak ELO of 2882 is not a badge or a certificate. It is the accumulated signal of thousands of games played against thousands of opponents. It means something precise.

I kept thinking about that word: precise. Most measurements of vocabulary are anything but.

The Problem With Vocabulary Measurement

Ask someone about their vocabulary and you get vague answers. “I read a lot.” “I did well in English at school.” “I know most words.” These are impressions, not measurements.

Even formal tests are blunt instruments. A GRE verbal score tells you roughly where you rank among test-takers on a particular day. It does not tell you which words you know and which you do not. It does not tell you whether your vocabulary improved last month or declined. It does not give you a number you can point to and say: this is where I stand, precisely, right now.

ELO can.

Applying ELO to Vocabulary

In Lemmerly, every word has a rating. Not an arbitrary difficulty label — a rating earned through thousands of interactions with real users. A word rated at 1,200 ELO is answered correctly about 60% of the time by users at the same level. A word rated at 1,600 is answered correctly much less often.

Every user has a rating too. When you answer a question, the system runs the same mathematics as a chess match: it predicts your probability of getting that word right based on the gap between your rating and the word's rating. If you beat the prediction — if you get a hard word right — your rating rises more than it would for getting an easy word right. If you miss an easy word, your rating falls further than missing a hard one.

The result is a number that earns its meaning the same way a chess rating does: through thousands of calibrated interactions, each one a small prediction tested against reality.

A 1,200 ELO on Lemmerly means you handle everyday formal vocabulary comfortably. A 1,500 means you can read Middlemarch without a dictionary. A 1,800 means you are the person your friends text when they cannot remember a word.

The Living Corpus

There is a second layer to this that took me time to appreciate fully. In Lemmerly, it is not just your ELO that moves — the words' ELOs move too.

If a word rated at 1,400 is being answered correctly by users at 900 ELO, something is wrong. Either the word is easier than we thought, or the answer choices are too obvious. The system notices. The word's rating drifts downward until it finds its true level.

The reverse is also true. A word we thought was easy keeps tripping up people who should know it — its ELO climbs.

This makes Lemmerly's corpus fundamentally different from a word list. A word list is static — “these are the 500 GRE words.” But difficulty is not static. Language changes. The questions that trip people up shift over time. A word that was obscure in 1990 might be common today. A distractor that seemed clearly wrong might, in practice, be genuinely confusing.

Every session on Lemmerly — every correct answer, every miss, every skip — is a data point. The ratings converge. The hard questions get harder. The easy ones find their level. The corpus becomes more accurate with every user who plays.

We have 58,000 words right now. Each one has a rating. Each rating is earned.

What Your Number Means

We built a title ladder — eight levels from Initiate to Lex Archmaster — because a single number without context is harder to hold in your mind. But the number is what matters. It is honest. It goes up when you improve and down when you do not. It is calibrated against every other user who has ever played.

Most people who take the placement test are surprised by their score — sometimes pleasantly, sometimes not. That is what honest measurement feels like. It is also the beginning of improvement.

Arpad Elo gave chess players a mirror. Lemmerly is attempting something similar for vocabulary — a mirror precise enough to be useful, calibrated enough to trust.

What is your number?

Find out exactly where your vocabulary stands.

Start Free — No Credit Card →