Align crawl docs with polling behavior

hmishra2250 · hmishra2250 · commit d70396c2d682 · 2026-06-26T09:50:09.000+05:30
Constraint: firecrawl_crawl starts a server-side crawl and polls until a terminal result before returning.
Rejected: Describing crawl as only returning an operation ID | that misleads agents into unnecessary status polling.
Confidence: high
Scope-risk: narrow
Directive: Keep tool descriptions synchronized with execution semantics.
Tested: npm run build.
Not-tested: pnpm run build is blocked by existing pnpm-workspace.yaml missing packages field in this checkout.
diff --git a/README.md b/README.md
@@ -354,7 +354,7 @@ Use this guide to select the right tool for your task:
 | interact     | Interact with a URL or scraped page            | Execution result + scrapeId for URL mode |
 | batch_scrape | Multiple known URLs                            | JSON (preferred) or markdown[] |
 | map          | Discovering URLs on a site                     | URL[]                          |
-| crawl        | Multi-page extraction (with limits)            | markdown/html[]                |
+| crawl        | Multi-page extraction (with limits)            | final crawl status/data after internal polling |
 | search       | Web search for info                            | results[]                      |
 | agent        | Complex multi-source research                  | JSON (structured data)         |
 
@@ -680,7 +680,7 @@ and small metadata objects. Do not include raw scrape/parse outputs.
 
 ### 6. Crawl Tool (`firecrawl_crawl`)
 
-Starts an asynchronous crawl job on a website and extract content from all pages.
+Starts a crawl job, polls until it reaches a terminal state, and returns the final crawl status/data.
 
 **Best for:**
 
@@ -689,14 +689,14 @@ Starts an asynchronous crawl job on a website and extract content from all pages
 **Not recommended for:**
 
 - Extracting content from a single page (use scrape)
-- When token limits are a concern (use map + batch_scrape)
+- When token limits are a concern (use map + scrape for tighter control)
 - When you need fast results (crawling can be slow)
 
-**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
+**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + scrape for tighter control.
 
 **Common mistakes:**
 
-- Setting limit or maxDepth too high (causes token overflow)
+- Setting limit or maxDiscoveryDepth too high (causes token overflow)
 - Using crawl for a single page (use scrape instead)
 
 **Prompt Example:**
@@ -710,33 +710,22 @@ Starts an asynchronous crawl job on a website and extract content from all pages
   "name": "firecrawl_crawl",
   "arguments": {
     "url": "https://example.com/blog/*",
-    "maxDepth": 2,
+    "maxDiscoveryDepth": 2,
     "limit": 100,
     "allowExternalLinks": false,
     "deduplicateSimilarURLs": true
   }
 }
 ```
 
-**Returns:**
 
-- Response includes operation ID for status checking:
+**Returns:**
 
-```json
-{
-  "content": [
-    {
-      "type": "text",
-      "text": "Started crawl for: https://example.com/* with job ID: 550e8400-e29b-41d4-a716-446655440000. Use firecrawl_check_crawl_status to check progress."
-    }
-  ],
-  "isError": false
-}
-```
+- Final crawl status and data after internal polling, including `id`, `status`, `completed`, `total`, `creditsUsed`, `expiresAt`, `next`, and `data`. Use the returned `id` with `firecrawl_check_crawl_status` if you need to re-check the job later.
 
 ### 7. Check Crawl Status (`firecrawl_check_crawl_status`)
 
-Check the status of a crawl job.
+Check the status and results of an existing crawl job by ID.
 
 ```json
 {
diff --git a/src/index.ts b/src/index.ts
@@ -1823,17 +1823,17 @@ Do not store multi-MB outputs in feedback. Use concise notes, issue codes, URLs,
 server.addTool({
   name: 'firecrawl_crawl',
   annotations: {
-    title: 'Start a site crawl',
-    readOnlyHint: false, // Starts an asynchronous crawl job, creating a persistent server-side job.
+    title: 'Run a site crawl',
+    readOnlyHint: false, // Starts a server-side crawl job and polls until the job reaches a terminal state.
     openWorldHint: true, // Crawls user-specified URLs across the public web.
     destructiveHint: false, // Reads pages from target sites; does not delete or alter external websites.
   },
   description: `
- Starts a crawl job on a website and extracts content from all pages.
+ Starts a crawl job on a website, polls until it reaches a terminal state, and returns the final crawl status/data.
  
  **Best for:** Extracting content from multiple related pages, when you need comprehensive coverage.
- **Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow).
- **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
+ **Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + scrape for tighter control); when you need fast results (crawling can be slow).
+ **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + scrape for tighter control.
  **Common mistakes:** Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended.
  **Prompt Example:** "Get all blog posts from the first two levels of example.com/blog."
  **Usage Example:**
@@ -1850,7 +1850,7 @@ server.addTool({
    }
  }
  \`\`\`
- **Returns:** Operation ID for status checking; use firecrawl_check_crawl_status to check progress.
+ **Returns:** Final crawl status and data after internal polling, including the crawl id. Use firecrawl_check_crawl_status only when you need to re-check an existing crawl ID later.
  ${
    SAFE_MODE
      ? '**Safe Mode:** Read-only crawling. Webhooks and interactive actions are disabled for security.'