Glossary·Concepts

Job aggregator

Definition

A job aggregator is a system that collects job postings from many sources - company career pages, applicant tracking systems, and other boards - and combines them into a single searchable, deduplicated index.

Also called: job aggregation, jobs aggregator.

Key points

A job aggregator indexes job postings from many external sources rather than hosting employer-submitted listings.
Most aggregators are backends - the data layer under boards, sourcing tools, and research products - not consumer sites.
Collection, deduplication, freshness, and normalization are the four problems, and all four get hard at scale.
Build the aggregator if coverage is your edge; rent the data layer if it is plumbing under a different value proposition.

What a job aggregator does

A job aggregator's job is breadth. Instead of hosting postings that employers submit directly, it goes out and collects postings that already exist across the web - on ATS-hosted career pages, on other job boards, in employer feeds - and pulls them into one place. Large consumer sites like Indeed are the familiar example, but most aggregators are not consumer destinations. They are backends: the data layer under a niche job board, a sourcing tool, a recruiting CRM, or a labor-market research product.

The defining feature is that the aggregator does not own the postings. It indexes other people's. That makes coverage a function of how many sources it crawls and how well it keeps up, rather than how many employers it has signed up.

The four problems every aggregator has to solve

Collection is only the first problem. The aggregator also has to deduplicate - the same job appears on the company's Greenhouse page, on Indeed, and on LinkedIn, and counting it three times makes the index look padded and useless. It has to track freshness - a posting that was filled last week but still shows as open erodes trust fast. And it has to normalize - postings arrive in dozens of shapes and have to become one schema before anyone can search across them.

None of those four is hard for a hundred postings. All of them are hard for a million, refreshed daily, across thousands of sources that each change their format on their own schedule. That gap between the prototype and the production system is where most build-it-yourself aggregator projects stall.

Build the aggregator or rent the data layer

If aggregation is your product's core differentiation - you are competing on coverage or freshness nobody else has - building it is defensible. If aggregation is plumbing underneath a product whose value is elsewhere, such as the matching algorithm, the workflow, or the analytics, then building the crawl, dedup, and normalization layer is months of effort spent on something that is not your edge. Renting a jobs-data API for that layer is usually the faster path.

The decision is not permanent. Plenty of teams rent the data layer to get to market, validate that the product works, and revisit building in-house only once jobs data is proven worth owning.

FAQ

What is the difference between a job aggregator and a job board?+

A job board hosts postings that employers submit to it directly - the board owns its inventory. A job aggregator collects postings that already exist elsewhere and indexes them. Many sites are both: they accept direct postings and also aggregate. The practical difference is the coverage model - a board grows by signing up employers, an aggregator grows by adding sources.

Is building a job aggregator legal?+

Reading publicly posted job listings is generally defensible, but it depends on how you collect them, which terms of service apply, and your jurisdiction. Public ATS career pages are a cleaner upstream source than scraping boards that prohibit it. This is not legal advice - if aggregation is central to your business, get a real review of your data path.

How fresh does an aggregator's data need to be?+

Fresh enough that users do not routinely hit dead postings. In practice a refresh cycle measured in hours, not days, is the bar for an aggregator anyone relies on. Stale data - jobs shown as open after they are filled - is the fastest way to lose trust, so freshness tracking matters as much as raw coverage.

JobsPipe is the jobs-data API behind this glossary - 30+ sources, one schema, free tier included.

Job board API

Job posting deduplication

Unified API