PDF Reader MCP

Extract text, images, and metadata from local or URL PDFs as structured output, dir-confined.

Verified

stdio (local)

No auth

TypeScript

View repo 200 Website npm

Add to your client

Copy the config for your MCP client and paste it into its config file.

Install / run

npx -y @sylphx/pdf-reader-mcp

Paste into ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "@sylphx/pdf-reader-mcp"
      ]
    }
  }
}

Step-by-step guides: Add to Claude Desktop · Add to Cursor · Add to Windsurf

Before you start

Node.js 22.13.0 or higher (required by pdfjs-dist v6)
npx (bundled with Node.js) to run @sylphx/pdf-reader-mcp without a global install
No credentials or API key needed — auth is none
Optional: a directory to confine reads to (passed via --allow-dir), and network access if reading PDFs from URLs

About PDF Reader MCP

PDF Reader MCP is a Model Context Protocol server that lets AI agents extract text, images, metadata, and page counts from PDF files. It accepts local files (absolute or relative paths), as well as remote PDFs over HTTP/HTTPS, and returns structured JSON rather than raw blobs, so agents get clean, ordered content back.

The server exposes a single unified read_pdf tool driven by boolean flags (include_full_text, include_images, include_metadata, include_page_count) and a per-source pages selector that supports ranges like "1-5,10-15,20". It uses Y-coordinate-based layout reconstruction to preserve natural reading order and can process multiple PDFs concurrently for speed.

It is built on pdfjs-dist and ships with directory-confinement and host-allowlist controls, making it safer to point at agent workspaces. It runs over stdio by default and can also be deployed as a remote HTTP server. Compatible with Claude Desktop, Claude Code, Cursor, Windsurf, Cline, VS Code, and Warp.

Tools & capabilities (1)

read_pdf

Unified tool to extract text, images, metadata, and page count from one or more PDF sources (local paths or URLs), with optional per-source page-range selection.

When to use it

Use it when an agent needs to read and summarize a local PDF report, contract, or invoice as structured text.
Use it when you want to pull specific page ranges out of a large PDF instead of the whole document.
Use it when you need to fetch and parse a PDF directly from a URL without downloading it yourself first.
Use it when you need PDF metadata (title, author, page count) for cataloging or routing logic.
Use it when you want to extract embedded images (as Base64 with dimensions) from a PDF for downstream processing.
Use it when you need to confine PDF access to a specific working directory for safety in an agent loop.

Quick setup

1Ensure Node.js 22.13.0+ is installed.
2Add the server to your MCP client config with command `npx` and args `["@sylphx/pdf-reader-mcp"]`.
3Optionally pass `--allow-dir=/path/to/pdfs` (repeatable) to confine filesystem reads, and `--allow-host=domain` or `--no-http` to control URL access.
4Restart your MCP client (e.g. Claude Desktop) so it picks up the new server.
5Verify by asking the agent to read a known local PDF and confirm it returns text or metadata.

Security notes

File access is confined to the working directory the host sets, so run it with the cwd scoped to the intended project folder. Parsing untrusted PDFs always carries some parser-exploitation risk; only process documents you trust. Note: use the current @sylphx/ package, not the older @sylphlab/ name.

PDF Reader MCP FAQ

Does it need an API key or authentication?

No. It runs locally over stdio with no auth, reading PDFs from the filesystem or from HTTP/HTTPS URLs you allow.

Can it read PDFs from a URL, not just local files?

Yes. Each source can specify a `url` field for HTTP/HTTPS PDFs. You can disable this with `--no-http` or restrict it to specific domains with `--allow-host`.

How do I restrict which directories it can read?

Pass `--allow-dir=/path` (repeatable) or set the `MCP_PDF_ALLOWED_DIRS` environment variable. Reads outside allowed directories fail fast with an Access denied error.

Can I extract only certain pages?

Yes. Each source accepts a `pages` parameter using ranges like "1-5,10-15,20" or an explicit array like [1,2,3].

Why is it returning an error about Node version?

pdfjs-dist v6 requires Node.js 22.13.0 or higher. Upgrade Node if you see engine or module errors on startup.

#pdf #documents #text-extraction #community #local-files

Alternatives to PDF Reader MCP

Compare all alternatives →

Filesystem (Reference)

Files & Storage

74k

Official MCP reference server for secure local filesystem read/write within allowed directories.

Verified

stdio (local)

No auth

TypeScript

13 tools

Updated 5 months agoRepo

Git (Reference)

Files & Storage

74k

Official MCP server for reading, searching, and manipulating a local Git repository's files and history.

Verified

stdio (local)

No auth

Python

12 tools

Updated 5 months agoRepo

AWS S3 Tables MCP Server

Files & Storage

9.2k

Official AWS Labs MCP server to manage and query S3 Tables (table buckets, namespaces, tables).

Verified

stdio (local)

API key

Python

12 tools

Updated 1 month agoRepo

Compare PDF Reader MCP with:

vs Filesystem (Reference)vs Git (Reference)vs AWS S3 Tables MCP Server vs Desktop Commander