How to scrape Google Jobs in Python (and the no-scrape alternative)
A working Python walkthrough for scraping Google Jobs results with requests and Playwright, the anti-bot and legal caveats nobody mentions, and how to skip all of it with one normalized API call.
Eng team
Engineering
Google Jobs is the box at the top of Google when you search something like data engineer jobs remote - an aggregated, enriched widget that pulls postings from career sites, ATSs, and other boards into one surface. That makes it a tempting scrape target, and one of the harder ones to scrape reliably. This is a working Python walkthrough, the caveats that rarely make it into the tutorial, and the no-scrape alternative.
What “Google Jobs” actually is
There is no public Google Jobs API for general developers. The results you see are Google’s job search experience (internally google.com/search?ibp=htl;jobs), built on top of the structured JobPosting data employers publish on their own sites. Google does not sell that feed. So “scraping Google Jobs” means rendering that search experience and parsing the cards out of it - with all the anti-bot friction Google applies to automated traffic.
The naive approach: requests + BeautifulSoup
The first thing everyone tries is a plain HTTP GET. It is worth showing because it teaches you exactly where the wall is:
import requests
from bs4 import BeautifulSoup
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
}
def fetch_google_jobs(query, location):
params = {"q": f"{query} {location}", "ibp": "htl;jobs", "hl": "en"}
resp = requests.get(
"https://www.google.com/search",
params=params, headers=HEADERS, timeout=20,
)
resp.raise_for_status()
return resp.text
html = fetch_google_jobs("data engineer", "Remote")
soup = BeautifulSoup(html, "html.parser")
cards = soup.select("div.EimVGf")
print(f"found {len(cards)} job cards")Run this from a clean residential IP a couple of times and it works. Run it from a datacenter IP, or more than a handful of times, and Google returns a consent interstitial or a CAPTCHA instead of results - so cards comes back empty. The widget is also largely rendered client side, so the cards often are not in the static HTML at all. Which is why most real scrapers use a browser.
The robust approach: Playwright
A headless browser renders the widget the way a user’s browser does, so the cards exist in the DOM and you can read them:
from playwright.sync_api import sync_playwright
def scrape_google_jobs(query, location, max_cards=25):
q = f"{query} {location}".replace(" ", "+")
url = f"https://www.google.com/search?ibp=htl;jobs&hl=en&q={q}"
jobs = []
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_context(locale="en-US").new_page()
page.goto(url, wait_until="domcontentloaded")
page.wait_for_selector("li", timeout=15000)
for li in page.query_selector_all("li")[:max_cards]:
title = li.query_selector("div[role='heading']")
company = li.query_selector("div.nJlQNd")
if not title:
continue
jobs.append({
"title": title.inner_text(),
"company": company.inner_text() if company else None,
})
browser.close()
return jobs
for job in scrape_google_jobs("data engineer", "Remote"):
print(job)Treat the selectors (EimVGf, nJlQNd) as illustrative, not stable - Google reshuffles these class names regularly, and keeping the parser alive is the bulk of the ongoing effort. To run this past a few queries you also need rotating residential proxies, a consent-cookie step per region, and randomized pacing.
The anti-bot and legal caveats
- Bot management. Google aggressively challenges automated traffic. Datacenter IPs draw CAPTCHAs fast; sustained scraping needs a residential or mobile proxy pool, priced per GB.
- Selector drift. The class names above change without notice. A scraper that worked last month silently returns empty fields until someone notices the data went stale.
- Terms of Service. Google’s terms prohibit automated access to Search. The public-data scraping case law (for example hiQ v. LinkedIn) is about the Computer Fraud and Abuse Act, not a site’s contract - so a terms breach is a separate, real risk for a commercial product. This is not legal advice; talk to a lawyer.
- It is a second-hand source. Google Jobs is itself an aggregator, so scraping it means scraping an aggregation of the original ATS and board postings - one more layer of drift and duplication between you and the source of truth.
Or skip it: use the JobsPipe API
JobsPipe already runs the scraping infrastructure - the proxies, the rendering, the parsing, and the cross-source de-duplication - across 30+ ATS and job-board sources, and returns one normalized JSON shape. Instead of a browser fleet you make one authenticated request:
curl https://api.jobspipe.dev/v1/jobs/search \
-H "Authorization: Bearer jp_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"job_title_or": ["data engineer"],
"remote": true,
"job_country_code_or": ["US"],
"posted_at_max_age_days": 7,
"limit": 25
}'Every record comes back the same way - title, company, normalized location, parsed compensation, posted_at, and an apply_url pointing back at the original listing - with no CAPTCHAs, no selector maintenance, and no proxy bill. The free tier is 5,000 requests per month.
Related research
- Job scraper: the build-vs-buy guide for 2026
- Where to get job posting data in 2026: 7 sources compared
- Indeed scraper vs Indeed API: why we deleted ours
Skip the Google Jobs scraper - 30+ sources, one API, 5,000 requests/month free.
Get a free API key