MCP Directory

MCP servers that scrape & crawl web pages

Extract clean content from any URL or crawl whole sites.

9 servers · Last updated June 17, 2026

TL;DR: Beyond search, these servers fetch and clean page content — turning messy HTML into markdown the model can actually use, or crawling entire sites. The differentiators are JavaScript rendering, anti-bot handling, and whether you get structured extraction or raw text.

Bottom line: if you only try one, Firecrawl is the most popular, verified option for this (6,562★). 8 more compared below.

Compare 9 servers

ServerTransportAuthVerifiedStarsTools for this
FirecrawlLocal (stdio)API key6,562firecrawl_scrape, firecrawl_batch_scrape, firecrawl_check_batch_status +5
Bright Data MCPLocal (stdio)API key5,000scrape_as_markdown, scrape_batch, extract
Browserbase MCP (Stagehand)Local (stdio)API key3,000extract
TavilyLocal (stdio)API key2,100tavily-extract, tavily-crawl
DuckDuckGo SearchLocal (stdio)No auth1,236fetch_content
Jina AI Reader & SearchRemote (HTTP)API key730extract_pdf
Kagi SearchLocal (stdio)API key417kagi_extract
AgentQL MCPLocal (stdio)API key400extract-web-data
Hyperbrowser MCPLocal (stdio)API key250scrape_webpage, crawl_webpages, extract_structured_data

The servers

Official Firecrawl MCP server — scrape, crawl, map, search, and structured extraction for any LLM client.

firecrawl_scrapefirecrawl_batch_scrapefirecrawl_check_batch_statusfirecrawl_mapfirecrawl_searchfirecrawl_crawl

All-in-one web access MCP — Web Unlocker, SERP, Scraper API, and a cloud Scraping Browser.

scrape_as_markdownscrape_batchextract

Official Browserbase cloud-browser MCP built on Stagehand — natural-language act/extract/observe.

extract

Production-ready MCP server for real-time web search, content extraction, site mapping, and crawling.

tavily-extracttavily-crawl

Popular no-API-key MCP server for DuckDuckGo web search plus page fetching and parsing.

fetch_content

Official Jina AI remote MCP server — read web pages as Markdown and run grounded web search over HTTP.

extract_pdf

Official Kagi MCP server (Python/uvx) — privacy-first web search and URL/video summarization.

kagi_extract

Turn any web page into structured data — AgentQL's prompt-driven extraction as a single MCP tool.

extract-web-data

Hyperbrowser's cloud browser MCP — scrape, crawl, extract structured data, run CUA/Claude/Browser-Use agents.

scrape_webpagecrawl_webpagesextract_structured_data

Use these in a stack

FAQ

Search vs scraping MCP — what's the difference?

Search returns ranked result links/snippets; scraping fetches and cleans the actual page content (often as markdown). RAG pipelines usually need both.

Which handles JavaScript-heavy sites?

Servers backed by a real browser or a rendering API (Firecrawl, Bright Data, Hyperbrowser) handle JS; plain HTTP fetchers don't.

Other capabilities