Align crawl tool copy with polling behavior

hmishra2250 · web-flow · commit 5b5f8e9dba65 · 2026-06-27T01:32:02.000+05:30
Constraint: firecrawl_crawl already polls to a terminal status and returns final crawl data.
Rejected: Describing crawl as only returning an operation ID | that encourages unnecessary follow-up polling.
Confidence: high
Scope-risk: narrow
Directive: Keep runtime-visible tool descriptions aligned with actual execution semantics.
Tested: npm run build; GitHub build check passed; independent review found no introduced runtime issue.
Not-tested: Full manual MCP crawl trace not run.
diff --git a/README.md b/README.md
@@ -351,7 +351,7 @@ Use this guide to select the right tool for your task:
 | scrape       | Single page content                            | JSON (preferred) or markdown   |
 | interact     | Interact with a URL or scraped page            | Execution result + scrapeId for URL mode |
 | map          | Discovering URLs on a site                     | URL[]                          |
-| crawl        | Multi-page extraction (with limits)            | markdown/html[]                |
+| crawl        | Multi-page extraction (with limits)            | final crawl status/data after internal polling |
 | search       | Web search for info                            | results[]                      |
 | agent        | Complex multi-source research                  | JSON (structured data)         |
 
@@ -612,7 +612,7 @@ and small metadata objects. Do not include raw scrape/parse outputs.
 
 ### 4. Crawl Tool (`firecrawl_crawl`)
 
-Starts an asynchronous crawl job on a website and extract content from all pages.
+Starts a crawl job, polls until it reaches a terminal state, and returns the final crawl status/data.
 
 **Best for:**
 
@@ -628,7 +628,7 @@ Starts an asynchronous crawl job on a website and extract content from all pages
 
 **Common mistakes:**
 
-- Setting limit or maxDepth too high (causes token overflow)
+- Setting limit or maxDiscoveryDepth too high (causes token overflow)
 - Using crawl for a single page (use scrape instead)
 
 **Prompt Example:**
@@ -642,33 +642,22 @@ Starts an asynchronous crawl job on a website and extract content from all pages
   "name": "firecrawl_crawl",
   "arguments": {
     "url": "https://example.com/blog/*",
-    "maxDepth": 2,
+    "maxDiscoveryDepth": 2,
     "limit": 100,
     "allowExternalLinks": false,
     "deduplicateSimilarURLs": true
   }
 }
 ```
 
-**Returns:**
 
-- Response includes operation ID for status checking:
+**Returns:**
 
-```json
-{
-  "content": [
-    {
-      "type": "text",
-      "text": "Started crawl for: https://example.com/* with job ID: 550e8400-e29b-41d4-a716-446655440000. Use firecrawl_check_crawl_status to check progress."
-    }
-  ],
-  "isError": false
-}
-```
+- Final crawl status and data after internal polling, including `id`, `status`, `completed`, `total`, `creditsUsed`, `expiresAt`, `next`, and `data`. Use the returned `id` with `firecrawl_check_crawl_status` if you need to re-check the job later.
 
 ### 5. Check Crawl Status (`firecrawl_check_crawl_status`)
 
-Check the status of a crawl job.
+Check the status and results of an existing crawl job by ID.
 
 ```json
 {
diff --git a/src/index.ts b/src/index.ts
@@ -1823,13 +1823,13 @@ Do not store multi-MB outputs in feedback. Use concise notes, issue codes, URLs,
 server.addTool({
   name: 'firecrawl_crawl',
   annotations: {
-    title: 'Start a site crawl',
-    readOnlyHint: false, // Starts an asynchronous crawl job, creating a persistent server-side job.
+    title: 'Run a site crawl',
+    readOnlyHint: false, // Starts a server-side crawl job and polls until the job reaches a terminal state.
     openWorldHint: true, // Crawls user-specified URLs across the public web.
     destructiveHint: false, // Reads pages from target sites; does not delete or alter external websites.
   },
   description: `
- Starts a crawl job on a website and extracts content from all pages.
+ Starts a crawl job on a website, polls until it reaches a terminal state, and returns the final crawl status/data.
  
  **Best for:** Extracting content from multiple related pages, when you need comprehensive coverage.
  **Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + scrape for tighter control); when you need fast results (crawling can be slow).
@@ -1850,7 +1850,7 @@ server.addTool({
    }
  }
  \`\`\`
- **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
+ **Returns:** Final crawl status and data after internal polling, including the crawl id. Use firecrawl_check_crawl_status only when you need to re-check an existing crawl ID later.
  ${
    SAFE_MODE
      ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.'