Skip to content

Commit 5b5f8e9

Browse files
authored
Align crawl tool copy with polling behavior
Constraint: firecrawl_crawl already polls to a terminal status and returns final crawl data. Rejected: Describing crawl as only returning an operation ID | that encourages unnecessary follow-up polling. Confidence: high Scope-risk: narrow Directive: Keep runtime-visible tool descriptions aligned with actual execution semantics. Tested: npm run build; GitHub build check passed; independent review found no introduced runtime issue. Not-tested: Full manual MCP crawl trace not run.
1 parent 871f178 commit 5b5f8e9

2 files changed

Lines changed: 11 additions & 22 deletions

File tree

README.md

Lines changed: 7 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -351,7 +351,7 @@ Use this guide to select the right tool for your task:
351351
| scrape | Single page content | JSON (preferred) or markdown |
352352
| interact | Interact with a URL or scraped page | Execution result + scrapeId for URL mode |
353353
| map | Discovering URLs on a site | URL[] |
354-
| crawl | Multi-page extraction (with limits) | markdown/html[] |
354+
| crawl | Multi-page extraction (with limits) | final crawl status/data after internal polling |
355355
| search | Web search for info | results[] |
356356
| agent | Complex multi-source research | JSON (structured data) |
357357

@@ -612,7 +612,7 @@ and small metadata objects. Do not include raw scrape/parse outputs.
612612

613613
### 4. Crawl Tool (`firecrawl_crawl`)
614614

615-
Starts an asynchronous crawl job on a website and extract content from all pages.
615+
Starts a crawl job, polls until it reaches a terminal state, and returns the final crawl status/data.
616616

617617
**Best for:**
618618

@@ -628,7 +628,7 @@ Starts an asynchronous crawl job on a website and extract content from all pages
628628

629629
**Common mistakes:**
630630

631-
- Setting limit or maxDepth too high (causes token overflow)
631+
- Setting limit or maxDiscoveryDepth too high (causes token overflow)
632632
- Using crawl for a single page (use scrape instead)
633633

634634
**Prompt Example:**
@@ -642,33 +642,22 @@ Starts an asynchronous crawl job on a website and extract content from all pages
642642
"name": "firecrawl_crawl",
643643
"arguments": {
644644
"url": "https://example.com/blog/*",
645-
"maxDepth": 2,
645+
"maxDiscoveryDepth": 2,
646646
"limit": 100,
647647
"allowExternalLinks": false,
648648
"deduplicateSimilarURLs": true
649649
}
650650
}
651651
```
652652

653-
**Returns:**
654653

655-
- Response includes operation ID for status checking:
654+
**Returns:**
656655

657-
```json
658-
{
659-
"content": [
660-
{
661-
"type": "text",
662-
"text": "Started crawl for: https://example.com/* with job ID: 550e8400-e29b-41d4-a716-446655440000. Use firecrawl_check_crawl_status to check progress."
663-
}
664-
],
665-
"isError": false
666-
}
667-
```
656+
- Final crawl status and data after internal polling, including `id`, `status`, `completed`, `total`, `creditsUsed`, `expiresAt`, `next`, and `data`. Use the returned `id` with `firecrawl_check_crawl_status` if you need to re-check the job later.
668657

669658
### 5. Check Crawl Status (`firecrawl_check_crawl_status`)
670659

671-
Check the status of a crawl job.
660+
Check the status and results of an existing crawl job by ID.
672661

673662
```json
674663
{

src/index.ts

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1823,13 +1823,13 @@ Do not store multi-MB outputs in feedback. Use concise notes, issue codes, URLs,
18231823
server.addTool({
18241824
name: 'firecrawl_crawl',
18251825
annotations: {
1826-
title: 'Start a site crawl',
1827-
readOnlyHint: false, // Starts an asynchronous crawl job, creating a persistent server-side job.
1826+
title: 'Run a site crawl',
1827+
readOnlyHint: false, // Starts a server-side crawl job and polls until the job reaches a terminal state.
18281828
openWorldHint: true, // Crawls user-specified URLs across the public web.
18291829
destructiveHint: false, // Reads pages from target sites; does not delete or alter external websites.
18301830
},
18311831
description: `
1832-
Starts a crawl job on a website and extracts content from all pages.
1832+
Starts a crawl job on a website, polls until it reaches a terminal state, and returns the final crawl status/data.
18331833
18341834
**Best for:** Extracting content from multiple related pages, when you need comprehensive coverage.
18351835
**Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + scrape for tighter control); when you need fast results (crawling can be slow).
@@ -1850,7 +1850,7 @@ server.addTool({
18501850
}
18511851
}
18521852
\`\`\`
1853-
**Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
1853+
**Returns:** Final crawl status and data after internal polling, including the crawl id. Use firecrawl_check_crawl_status only when you need to re-check an existing crawl ID later.
18541854
${
18551855
SAFE_MODE
18561856
? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.'

0 commit comments

Comments
 (0)