AI salary parsing: from $180k-DOE strings to structured ranges
Walking through the parsing pipeline that turns 11 different salary phrasings into one normalized JSON shape.
Product
Product
Salary is the field developers ask about most and the one that’s hardest to get right. Most ATSs don’t expose a structured comp field at all — the salary range, when it exists, is embedded in the job description as free text. And it’s embedded in roughly 11 different shapes.
Eleven shapes of “$180k”
Here’s a sample of the phrasings we’ve found in the wild:
- “$180,000 – $240,000”
- “$180K to $240K base + equity”
- “Compensation: $180,000-240,000 OTE”
- “Salary range: 180-240k USD”
- “Up to $240k DOE”
- “€180k cash + significant equity”
- “Comp band: $180k–$240k. DOE.”
- “$15,000–$20,000 per month”
- “DOE” (just that)
- “Pay for this role: 180000-240000”
- “Annual compensation: 180–240 thousand”
The parser, in three layers
Layer one is a regex matcher tuned to the 80% case. If a description contains $Xk–$Yk or $X,000–$Y,000 we capture it deterministically. This catches roughly 70% of postings.
Layer two is a structured prompt sent to a small language model — we use Haiku for cost. We pass the description and ask for min, max, currency, period, and includes_equity as a strict JSON schema. If the model returns confidence above 0.85 we accept it.
Layer three is the abstain layer. If neither the regex nor the LLM gives high-confidence output, we set compensation: null. A null is better than a wrong number — wrong numbers compound when customers filter by them.
What the output looks like
{
"compensation": {
"min": 180000,
"max": 240000,
"currency": "USD",
"period": "yearly",
"includes_equity": true
}
}Same shape, every source, every job. The free-text mess stays in the description field for anyone who wants it. Most don’t.
Try it free — 5,000 requests/month, no credit card.
Get a free API key