liildev

How to design metered usage systems that stay transparent and trustworthy — from balance display to graceful degradation when credits run out.

aiproductuxsaas

Credit-based usage is one of the more interesting UX challenges in AI products. You're asking users to think about cost in a context where they're focused on output — a fundamentally bad time to interrupt someone.

The core tension

Users want to use the product. They don't want to think about credits. Your business needs them to think about credits, at least a little, or they burn through their allocation and churn.

The naive approach — show a balance counter prominently, block actions when it hits zero — creates anxiety. Users self-censor. They avoid exploring. Engagement drops before the balance does.

What actually works

1. Show balance in context, not globally.

A global credit counter in the nav is ambient anxiety. Instead, surface the cost estimate at the point of action — right before the user submits a generation. "This will use ~12 credits" is useful. A persistent "342 credits remaining" badge is noise.

2. Degrade gracefully, not abruptly.

When a user runs low, don't wall them immediately. Let the current operation finish. Show a low-balance warning after the result, not before. The user got value first; now they're receptive to a top-up prompt.

3. Make refills feel natural, not transactional.

The best credit top-up UX I've built felt like a checkout flow that happened to sell credits, not like hitting a paywall. The framing matters: "Get more generations" beats "Purchase credits."

4. Keep the math legible.

If "1 generation = 12 credits" and the user has "342 credits," they should be able to quickly estimate how many generations they have left. Round numbers help. Consider surfacing "~28 generations remaining" instead of the raw credit count.

Implementation notes

On the technical side, optimistic credit deduction is almost always wrong. Deduct on the server after the operation completes (or at commit time if you're streaming). Client-side prediction causes desync — especially with streaming responses where the actual token count differs from the estimate.

For streaming outputs specifically: show a live token counter during generation if you want users to understand where credits go. Transparency builds trust more than any amount of copy.

The system I built for ContentumAI uses a pre-authorization model: check balance before starting, reserve credits, then settle at completion. Cancellations refund immediately. This gives users a clean mental model: if they cancel, they don't pay.