Skip to content

Commit d70396c

Browse files
committed
Align crawl docs with polling behavior
Constraint: firecrawl_crawl starts a server-side crawl and polls until a terminal result before returning. Rejected: Describing crawl as only returning an operation ID | that misleads agents into unnecessary status polling. Confidence: high Scope-risk: narrow Directive: Keep tool descriptions synchronized with execution semantics. Tested: npm run build. Not-tested: pnpm run build is blocked by existing pnpm-workspace.yaml missing packages field in this checkout.
1 parent e744bba commit d70396c

2 files changed

Lines changed: 15 additions & 26 deletions

File tree

README.md

Lines changed: 9 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,7 @@ Use this guide to select the right tool for your task:
354354
| interact | Interact with a URL or scraped page | Execution result + scrapeId for URL mode |
355355
| batch_scrape | Multiple known URLs | JSON (preferred) or markdown[] |
356356
| map | Discovering URLs on a site | URL[] |
357-
| crawl | Multi-page extraction (with limits) | markdown/html[] |
357+
| crawl | Multi-page extraction (with limits) | final crawl status/data after internal polling |
358358
| search | Web search for info | results[] |
359359
| agent | Complex multi-source research | JSON (structured data) |
360360

@@ -680,7 +680,7 @@ and small metadata objects. Do not include raw scrape/parse outputs.
680680

681681
### 6. Crawl Tool (`firecrawl_crawl`)
682682

683-
Starts an asynchronous crawl job on a website and extract content from all pages.
683+
Starts a crawl job, polls until it reaches a terminal state, and returns the final crawl status/data.
684684

685685
**Best for:**
686686

@@ -689,14 +689,14 @@ Starts an asynchronous crawl job on a website and extract content from all pages
689689
**Not recommended for:**
690690

691691
- Extracting content from a single page (use scrape)
692-
- When token limits are a concern (use map + batch_scrape)
692+
- When token limits are a concern (use map + scrape for tighter control)
693693
- When you need fast results (crawling can be slow)
694694

695-
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
695+
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + scrape for tighter control.
696696

697697
**Common mistakes:**
698698

699-
- Setting limit or maxDepth too high (causes token overflow)
699+
- Setting limit or maxDiscoveryDepth too high (causes token overflow)
700700
- Using crawl for a single page (use scrape instead)
701701

702702
**Prompt Example:**
@@ -710,33 +710,22 @@ Starts an asynchronous crawl job on a website and extract content from all pages
710710
"name": "firecrawl_crawl",
711711
"arguments": {
712712
"url": "https://example.com/blog/*",
713-
"maxDepth": 2,
713+
"maxDiscoveryDepth": 2,
714714
"limit": 100,
715715
"allowExternalLinks": false,
716716
"deduplicateSimilarURLs": true
717717
}
718718
}
719719
```
720720

721-
**Returns:**
722721

723-
- Response includes operation ID for status checking:
722+
**Returns:**
724723

725-
```json
726-
{
727-
"content": [
728-
{
729-
"type": "text",
730-
"text": "Started crawl for: https://example.com/* with job ID: 550e8400-e29b-41d4-a716-446655440000. Use firecrawl_check_crawl_status to check progress."
731-
}
732-
],
733-
"isError": false
734-
}
735-
```
724+
- Final crawl status and data after internal polling, including `id`, `status`, `completed`, `total`, `creditsUsed`, `expiresAt`, `next`, and `data`. Use the returned `id` with `firecrawl_check_crawl_status` if you need to re-check the job later.
736725

737726
### 7. Check Crawl Status (`firecrawl_check_crawl_status`)
738727

739-
Check the status of a crawl job.
728+
Check the status and results of an existing crawl job by ID.
740729

741730
```json
742731
{

src/index.ts

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1823,17 +1823,17 @@ Do not store multi-MB outputs in feedback. Use concise notes, issue codes, URLs,
18231823
server.addTool({
18241824
name: 'firecrawl_crawl',
18251825
annotations: {
1826-
title: 'Start a site crawl',
1827-
readOnlyHint: false, // Starts an asynchronous crawl job, creating a persistent server-side job.
1826+
title: 'Run a site crawl',
1827+
readOnlyHint: false, // Starts a server-side crawl job and polls until the job reaches a terminal state.
18281828
openWorldHint: true, // Crawls user-specified URLs across the public web.
18291829
destructiveHint: false, // Reads pages from target sites; does not delete or alter external websites.
18301830
},
18311831
description: `
1832-
Starts a crawl job on a website and extracts content from all pages.
1832+
Starts a crawl job on a website, polls until it reaches a terminal state, and returns the final crawl status/data.
18331833
18341834
**Best for:** Extracting content from multiple related pages, when you need comprehensive coverage.
1835-
**Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow).
1836-
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
1835+
**Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + scrape for tighter control); when you need fast results (crawling can be slow).
1836+
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + scrape for tighter control.
18371837
**Common mistakes:** Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended.
18381838
**Prompt Example:** "Get all blog posts from the first two levels of example.com/blog."
18391839
**Usage Example:**
@@ -1850,7 +1850,7 @@ server.addTool({
18501850
}
18511851
}
18521852
\`\`\`
1853-
**Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
1853+
**Returns:** Final crawl status and data after internal polling, including the crawl id. Use firecrawl_check_crawl_status only when you need to re-check an existing crawl ID later.
18541854
${
18551855
SAFE_MODE
18561856
? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.'

0 commit comments

Comments
 (0)