Most People Watch the Wrong Tokens

A day after I put the first version of Smoower.Minified online, someone asked the only question that matters: how do you actually know it saves anything? Fair. I had been staring at token counts for a week by then, but "Tim eyeballed it and felt good" is not a number anyone should trust, least of all me.

So before I write another word selling the idea, here is the boring part. How I measured it, what held up, and the swaps I was sure about that turned out to do absolutely nothing.

The token nobody watches

Almost every "save tokens" conversation is about input. Trim the system prompt. Cache the context. Stop resending the whole file. All real, all worth doing.

But input is the cheap side. On the current Claude models, output tokens cost several times more than input, and there is a second cost people forget: output is generated one token at a time, sequentially, so the number of output tokens tracks wall-clock generation time almost linearly. Halve the output and you roughly halve how long you sit there watching the cursor.

Code generation is almost entirely an output problem. The model reads your prompt once and then emits hundreds of lines. That is the side I wanted to attack, and it is the side prompt caching does nothing for.

Measure with the real tokenizer, not a convenient one

My first instinct was to use tiktoken with o200k_base, because it runs offline and it is fast. That was a mistake, or at least a trap I almost walked into.

Claude does not tokenize C# the way tiktoken does. In my runs Claude's tokenizer comes out roughly 1.7 to 1.8 times higher on the same code, and more importantly it flattens the spread between "clever" encodings that tiktoken makes look dramatic. If I had published the tiktoken numbers, the savings would have looked bigger than they really are.

So every headline number in the findings now comes from Claude's actual count_tokens endpoint (it is free, by the way, which removes any excuse). Tiktoken stays as an offline sanity check, never as the source of a claim. If a tool tells you it saves X percent and X came from the wrong tokenizer, X is fiction.

The thing I got wrong

Here is the part that humbled me.

I assumed the cute single-letter swaps were doing the heavy lifting. .Where( becomes .w(. .Select( becomes .s(. Looks shorter, must be cheaper, right?

.Where to .w saves zero tokens. None. Where is already a single token to the tokenizer, and so is Select. The same goes for db.SaveChanges() shrinking to a sync helper: also zero, because the win on EF Core comes entirely from dropping the long ...Async suffixes, not from the dots and letters.

I had shipped a bunch of swaps on pure vibes. When I measured them one at a time, a chunk of my favourites turned out to be decoration. They make the code shorter to read and keep the style consistent, which is fine, but they are not why the bill drops.

The real savings come from two places. Collapsing the long PascalCase identifiers that tokenize into three to five sub-tokens each (FirstOrDefaultAsync, Task<IActionResult>, AddScoped), and the result-fusing terminators that delete an entire await ... return x == null ? NotFound() : Ok(x) dance into one ok1(). Everything else is rounding error.

What the swap looks like on real code

Enough talking about it. Here is one endpoint, lifted straight out of a controller in the samples folder (trimmed to a single dependency so it fits on screen). The normal version first:

[Authorize]
[ApiController]
[Route("entries/{entryId:guid}")]
public class EntryExpiryController : ControllerBase
{
    private readonly IExpiryService _expiryService;

    public EntryExpiryController(IExpiryService expiryService)
    {
        _expiryService = expiryService;
    }

    [HttpGet("expiry-config")]
    [RequireDocumentRole(PermissionLevel.Viewer)]
    public async Task<IActionResult> GetExpiryConfig(Guid entryId)
    {
        try { return Ok(await _expiryService.GetEffectiveConfigAsync(entryId)); }
        catch (KeyNotFoundException) { return NotFound(); }
    }
}

The same thing with Smoower.Minified:

[AUTH, API, RT("entries/{entryId:guid}")]
public class EntryExpiryController(IExpiryService expiryService) : Ctl
{
    [HG("expiry-config")]
    [RequireDocumentRole(PermissionLevel.Viewer)]
    public async Tr GetExpiryConfig(Guid entryId)
    {
        try { return Ok(await expiryService.GetEffectiveConfigAsync(entryId)); }
        catch (KNF) { return nf(); }
    }
}

Notice what did not move. The route "entries/{entryId:guid}", the sub-route "expiry-config", the PermissionLevel.Viewer role, the GET verb, the service call. Every promise the API makes to the outside world is byte-for-byte identical. What shrank is only the framework ceremony: the three attributes folded to [AUTH, API, RT], the constructor boilerplate became a primary constructor, ControllerBase to Ctl, Task<IActionResult> to Tr, KeyNotFoundException to KNF, NotFound() to nf(). (I kept the formatting readable on purpose. The line-packing you will see in the repo's .min files is an input trick, not how I want a model to actually write.)

What the numbers actually say

Once I was measuring properly, with Claude's tokenizer and one variable at a time, the picture got honest and a lot less flashy.

A single hot controller action: around 55% fewer output tokens.
A full CRUD controller: around 35%, because real controllers carry methods that do not compress as well.
A realistic sample app (TodoApi, with a status state machine, WIP limits, recurrence, dashboard aggregation, not toy CRUD): about 25% end to end.
Across a whole project: 10 to 25%.

That last band is the one I care most about being honest about. The catch is that those numbers are the first level only. Smoower runs as a dial, not a switch, and the first level (L1, plain aliases) rewrites framework ceremony and nothing else. At that level it buys roughly 30% on controllers, about 15% on logic-heavy services, and a flat 0% on entities and DTOs, because aliases simply do not reach into them. Add it up across a repo and L1 alone lands in that 10 to 25 band, not the 55% a single cherry-picked snippet would let me brag about. To push past it you turn the dial up, which is the next thing.

The level that scales with the codebase

L1 is per-snippet. It shrinks a controller by about the same fraction whether you have one controller or two hundred. Useful, but flat. The percentage does not get better as you grow.

The second level, Mapped (L2), is the one that does, and it is the one I am still most careful about. This is where the dial reaches past framework ceremony into the business logic, the models, the domain identifiers themselves. You shorten the internal C# names while leaving the contract frozen, and you pin every long form once in a names.map so an editor can show you the readable version on demand.

The mechanism matters, so stay with me for a second. A name like RecurrenceDays costs 8 Claude tokens. OrganizationId and CreatedAt are 6 each. TodoTask is 5. Every time the model writes one of those it pays the full count, and a real domain model says them constantly: the entity, the DTO, the AutoMapper profile, three queries, the validator, the tests. The cost is not the declaration. It is the hundred references to the declaration.

You can cut those without moving a single byte a client sees. RecurrenceDays becomes a 2-token internal name carried by a [Col("recurrence_days")] or [JPN("recurrenceDays")] attribute that keeps the database column and the JSON exactly as they were. The wire contract does not change. Only the symbol the model has to type does. (This is exactly where the flat 0% on entities from L1 stops being flat. L1 leaves those names alone, L2 goes in, but only ever behind a carrier that freezes what is promised.)

Here is why it is a different animal. That carrier is a one-time cost. You declare the mapping once, and every reference after it is pure savings, so the math is not per-file, it is per-occurrence across the entire repo. It compounds as the codebase grows, and it compounds fast. The measured numbers, Claude tokenizer again: on a toy codebase where an identifier appears two to five times, the deeper renaming nets about 32 tokens after you pay for the mapping. Barely worth getting out of bed for. Push the same identifier to ten uses and it is roughly 3,000 tokens. At thirty uses, around 9,700. Same names, same one-time cost, the savings just keep stacking because a large codebase writes OrganizationId hundreds of times and you only declared the short form once.

That is the opposite shape from the alias layer. Aliases hand you a flat percentage. Short-naming hands you something that gets better the more code you have, which is exactly backwards from how most optimisations age, and it is the reason a big .NET solution can clear the top of that 10 to 25 band while a sample app sits near the floor.

It is also the easiest lever to get wrong, which is why I gate it the hardest:

Only shorten names with real headroom. Id, Title, and Status are already one or two tokens, there is nothing to win. Description to Desc actually comes out negative on Claude's tokenizer, so that "obvious" shortening costs you. Measure each one before you commit it.
An identifier that shows up once or twice is a net loss. The break-even is real, and below it you are paying the carrier cost for nothing. Only the hot, repeated names earn their keep.
Never alias an enum type through a global using. It quietly breaks switch and nameof, and that is a debugging session nobody signed up for.

Done with discipline, this is where a large codebase finally gets the savings the snippet demos only tease. Done carelessly, it is how you end up with a name three teammates now have to expand in their heads on every read. I lean conservative on this one for a reason.

How I decide what ships

The rule I landed on is simple and slightly annoying to follow: a swap only ships if I can show the token delta with the real tokenizer. No delta, no ship.

That killed a few ideas. I tried pushing past readable aliases into a numeric, cryptic encoding, on the theory that shorter symbols must mean fewer tokens. It came out worse than the readable declarative form (94 tokens versus 88 on the same action). The big jump past plain aliases came from a source-generator convention that deletes implied structure, not from making the code look like line noise. So readability stayed, and the cryptic experiment died in the benchmark folder where it belongs.

The third level, Max (L3), is the one I deliberately fenced off. It strips whitespace and newlines on top of L2, and it saves a real amount on conventional code, somewhere between 15 and 37% on multi-line commented files, because leading indentation tokenizes into its own discrete tokens (about 0.75 Claude tokens per whitespace character, which surprised me). But I only trust that for input and storage, where the model is reading. Asking a model to generate single-line packed code is untested for how often it breaks, so until the pack-and-expand tooling round-trips deterministically and I have the reliability numbers, L3 stays a storage trick, not an output one.

Which is the other thing I now measure: not just whether the code is shorter, but whether the model reliably writes it. Across 30 generations on 15 tasks, not one fell back to the long-form [ApiController] or ControllerBase. The declaration layer is solid. The failures, and there were a few, all lived in the hand-written escape-hatch body, including one genuinely nasty one where the model wrote a fire-and-forget call that compiled fine and would have been a silent runtime bug. Shorter is worthless if it is also wrong, so that is where the next round of work goes.

I will publish the full harness and the per-mapping deltas as they firm up. For now the findings file has the tables, and you can rerun the economics script yourself if you do not believe my arithmetic. Please do. Finding the swap that secretly saves nothing is the most useful thing you could send me right now.