NewSearch millions of jobs from your AI agent with MCP
All posts
Guide·Jun 28, 2026·8 min read

How to scrape LinkedIn jobs in Python (and the API that replaces it)

A real Python walkthrough of LinkedIn's guest jobs endpoint with requests and BeautifulSoup - parsing cards, pulling job detail, and why it breaks in production. Then the API where LinkedIn is the main source.

EN

Eng team

Engineering

LinkedIn has the deepest job graph on the internet and the most hostile surface to scrape it from. This is the code-level version - a real Python walkthrough of the one semi-stable endpoint, how to parse it, and where it falls over in production. For the higher-level overview of methods, tools, and legal posture, see our guide to scraping LinkedIn jobs; this post is the hands-on Python version.

The guest jobs endpoint

You do not need to touch the logged-in app. LinkedIn exposes a logged-out “guest” jobs API that returns server-rendered HTML fragments of job cards - no auth, no JSON, just HTML you parse:

https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search
  ?keywords=python+developer
  &location=United+States
  &start=0

The start parameter pages in increments of 25. Each response is a list of <li> elements, one per job, carrying the title, company, location, and a link to the full posting.

A working scraper

import time
import requests
from bs4 import BeautifulSoup

SEARCH = "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search"
HEADERS = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36"}

def scrape_linkedin_jobs(keywords, location, pages=3):
    jobs = []
    for page in range(pages):
        params = {"keywords": keywords, "location": location, "start": page * 25}
        resp = requests.get(SEARCH, params=params, headers=HEADERS, timeout=20)
        if resp.status_code != 200:
            print(f"stopped at page {page}: HTTP {resp.status_code}")
            break
        soup = BeautifulSoup(resp.text, "html.parser")
        cards = soup.select("li")
        if not cards:
            break
        for card in cards:
            title = card.select_one("h3.base-search-card__title")
            company = card.select_one("h4.base-search-card__subtitle")
            link = card.select_one("a.base-card__full-link")
            base = card.select_one("div.base-card")
            if not title:
                continue
            jobs.append({
                "title": title.get_text(strip=True),
                "company": company.get_text(strip=True) if company else None,
                "url": link["href"].split("?")[0] if link else None,
                "job_id": base["data-entity-urn"].split(":")[-1] if base else None,
            })
        time.sleep(2)
    return jobs

for job in scrape_linkedin_jobs("python developer", "United States"):
    print(job)

The data-entity-urn attribute holds a value like urn:li:jobPosting:3741290021 - the trailing number is the job ID you use to pull full detail.

Pulling the full job detail

The search endpoint gives you the card; a second guest endpoint gives you the description, seniority, and employment type:

def get_job_detail(job_id):
    url = f"https://www.linkedin.com/jobs-guest/jobs/api/jobPosting/{job_id}"
    resp = requests.get(url, headers=HEADERS, timeout=20)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "html.parser")
    desc = soup.select_one("div.show-more-less-html__markup")
    criteria = [
        c.get_text(strip=True)
        for c in soup.select("span.description__job-criteria-text")
    ]
    return {
        "description": desc.get_text(" ", strip=True) if desc else None,
        "criteria": criteria,
    }

print(get_job_detail("3741290021"))

Why this breaks in production

  • Rate limiting. The guest endpoint tolerates a trickle of traffic. Push it and you get 429 responses, then IP blocks. Real coverage needs a rotating residential proxy pool.
  • It is a subset. The guest API exposes a fraction of what the logged-in search shows, with coarser filters. Scraping the authenticated app means real accounts, and LinkedIn bans accounts used for automation - this is the most aggressively defended target in the space.
  • Markup drift. The class names above (base-search-card__title and friends) change, and the parser breaks silently when they do.
  • Terms of Service. Automated access is against LinkedIn’s User Agreement. The hiQ v. LinkedIn line of cases is about the Computer Fraud and Abuse Act and public data, not LinkedIn’s contract - so a terms breach is its own risk for a commercial product. Not legal advice.

The API that replaces it

LinkedIn is JobsPipe’s primary source. We run the collection, proxying, and parsing once, for everyone, and serve LinkedIn postings through the same normalized endpoint as every other source - so you write zero scraping code:

curl https://api.jobspipe.dev/v1/jobs/search \
  -H "Authorization: Bearer jp_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "job_title_or": ["python developer"],
    "job_country_code_or": ["US"],
    "posted_at_max_age_days": 7,
    "limit": 25
  }'

Same record shape from every source - title, company, normalized location, parsed compensation, seniority, posted_at, and an apply_url - de-duplicated across sources, with no proxies and no banned accounts. The free tier is 5,000 requests per month.

Related research

Get LinkedIn jobs without the scraper - free tier, 5,000 requests/month.

Get a free API key