jobs-search 端点是 LinkedIn 上最容易抓取的接口:它被有意暴露给未经身份验证的访问者,且仅通过一个 start 查询参数进行分页。撰写本文时,路径为 /jobs-guest/jobs/api/seeMoreJobPostings/search,响应返回的是 HTML 职位卡片而非 JSON。由于 LinkedIn 会轮换内部接口,请在正式运行前通过开发者工具重新确认路径。
使用 BeautifulSoup 解析的简易 Python 示例如下:
import requests
from bs4 import BeautifulSoup
BASE = "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search"
HEADERS = {
"User-Agent": "Mozilla/5.0",
"Accept-Language": "en-US,en;q=0.9",
}
def fetch_page(keywords, location, start=0):
params = {"keywords": keywords, "location": location, "start": start}
r = requests.get(BASE, params=params, headers=HEADERS, timeout=20)
r.raise_for_status()
return r.text
def parse_cards(html):
soup = BeautifulSoup(html, "html.parser")
for card in soup.select("li"):
title = card.select_one(".base-search-card__title")
company = card.select_one(".base-search-card__subtitle")
loc = card.select_one(".job-search-card__location")
link = card.select_one("a.base-card__full-link")
if title and link:
yield {
"title": title.get_text(strip=True),
"company": company.get_text(strip=True) if company else None,
"location": loc.get_text(strip=True) if loc else None,
"url": link["href"].split("?")[0],
}
jobs, start = [], 0
while True:
html = fetch_page("python developer", "Berlin", start)
batch = list(parse_cards(html))
if not batch:
break
jobs.extend(batch)
start += 25
每页显示二十五条职位卡片。停止条件是结果集为空,而非固定的页数,因为 LinkedIn 会根据地理位置和发布时间对结果进行筛选。将 jobs 数据流输入 Python 的 csv 模块或 Pandas 数据框,即可在不使用浏览器的情况下获取 LinkedIn 职位信息流。若需复习 BeautifulSoup 选择器模式,可参考相关教程。