Skip to content

Commit 871f178

Browse files
authored
Stop advertising nonexistent MCP batch tooling
Constraint: The MCP server does not expose firecrawl_batch_scrape or firecrawl_check_batch_status. Rejected: Adding MCP batch tools now | expanding tool count needs product/experiment approval. Confidence: high Scope-risk: narrow Directive: Keep README and runtime-visible tool descriptions limited to registered MCP tools. Tested: npm run build; GitHub build check passed; independent re-review approved. Not-tested: Full manual MCP trace not run.
1 parent e744bba commit 871f178

2 files changed

Lines changed: 22 additions & 91 deletions

File tree

README.md

Lines changed: 20 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -324,22 +324,20 @@ These configurations control:
324324
- Warning at 1000 credits remaining
325325
- Critical alert at 100 credits remaining
326326

327-
### Rate Limiting and Batch Processing
327+
### Rate Limiting
328328

329-
The server utilizes Firecrawl's built-in rate limiting and batch processing capabilities:
329+
The server uses Firecrawl's built-in rate limiting:
330330

331331
- Automatic rate limit handling with exponential backoff
332-
- Efficient parallel processing for batch operations
333332
- Smart request queuing and throttling
334333
- Automatic retries for transient errors
335334

336335
## How to Choose a Tool
337336

338337
Use this guide to select the right tool for your task:
339338

340-
- **If you know the exact URL(s) you want:**
341-
- For one: use **scrape** (with JSON format for structured data)
342-
- For many: use **batch_scrape**
339+
- **If you know the exact URL you want:** use **scrape** (with JSON format for structured data)
340+
- **If you have multiple known URLs:** call **scrape** for each URL. If you specifically need one bulk API operation, use the Firecrawl API batch endpoint outside MCP.
343341
- **If you need to discover URLs on a site:** use **map**
344342
- **If you want to search the web for info:** use **search**
345343
- **If you need complex research across multiple unknown sources:** use **agent**
@@ -352,15 +350,14 @@ Use this guide to select the right tool for your task:
352350
| ------------ | ---------------------------------------------- | ------------------------------ |
353351
| scrape | Single page content | JSON (preferred) or markdown |
354352
| interact | Interact with a URL or scraped page | Execution result + scrapeId for URL mode |
355-
| batch_scrape | Multiple known URLs | JSON (preferred) or markdown[] |
356353
| map | Discovering URLs on a site | URL[] |
357354
| crawl | Multi-page extraction (with limits) | markdown/html[] |
358355
| search | Web search for info | results[] |
359356
| agent | Complex multi-source research | JSON (structured data) |
360357

361358
### Format Selection Guide
362359

363-
When using `scrape` or `batch_scrape`, choose the right format:
360+
When using `scrape`, choose the right format:
364361

365362
- **JSON format (recommended for most cases):** Use when you need specific data from a page. Define a schema based on what you need to extract. This keeps responses small and avoids context window overflow.
366363
- **Markdown format (use sparingly):** Only when you genuinely need the full page content, such as reading an entire article for summarization or analyzing page structure.
@@ -377,12 +374,12 @@ Scrape content from a single URL with advanced options.
377374

378375
**Not recommended for:**
379376

380-
- Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
377+
- Extracting content from multiple pages (use repeated scrape calls for known URLs, or map + scrape to discover URLs first, or crawl for full page content)
381378
- When you're unsure which page contains the information (use search)
382379

383380
**Common mistakes:**
384381

385-
- Using scrape for a list of URLs (use batch_scrape instead).
382+
- Passing a list of URLs to one scrape call. Call scrape once per URL in MCP. If you specifically need one bulk API operation, use the Firecrawl API batch endpoint outside MCP.
386383
- Using markdown format by default (use JSON format to extract only what you need).
387384

388385
**Choosing the right format:**
@@ -452,72 +449,7 @@ Scrape content from a single URL with advanced options.
452449

453450
- JSON structured data, markdown, branding profile, or other formats as specified.
454451

455-
### 2. Batch Scrape Tool (`firecrawl_batch_scrape`)
456-
457-
Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.
458-
459-
**Best for:**
460-
461-
- Retrieving content from multiple pages, when you know exactly which pages to scrape.
462-
463-
**Not recommended for:**
464-
465-
- Discovering URLs (use map first if you don't know the URLs)
466-
- Scraping a single page (use scrape)
467-
468-
**Common mistakes:**
469-
470-
- Using batch_scrape with too many URLs at once (may hit rate limits or token overflow)
471-
472-
**Prompt Example:**
473-
474-
> "Get the content of these three blog posts: [url1, url2, url3]."
475-
476-
**Usage Example:**
477-
478-
```json
479-
{
480-
"name": "firecrawl_batch_scrape",
481-
"arguments": {
482-
"urls": ["https://example1.com", "https://example2.com"],
483-
"options": {
484-
"formats": ["markdown"],
485-
"onlyMainContent": true
486-
}
487-
}
488-
}
489-
```
490-
491-
**Returns:**
492-
493-
- Response includes operation ID for status checking:
494-
495-
```json
496-
{
497-
"content": [
498-
{
499-
"type": "text",
500-
"text": "Batch operation queued with ID: batch_1. Use firecrawl_check_batch_status to check progress."
501-
}
502-
],
503-
"isError": false
504-
}
505-
```
506-
507-
### 3. Check Batch Status (`firecrawl_check_batch_status`)
508-
509-
Check the status of a batch operation.
510-
511-
```json
512-
{
513-
"name": "firecrawl_check_batch_status",
514-
"arguments": {
515-
"id": "batch_1"
516-
}
517-
}
518-
```
519-
520-
### 4. Map Tool (`firecrawl_map`)
452+
### 2. Map Tool (`firecrawl_map`)
521453

522454
Map a website to discover all indexed URLs on the site.
523455

@@ -528,7 +460,7 @@ Map a website to discover all indexed URLs on the site.
528460

529461
**Not recommended for:**
530462

531-
- When you already know which specific URL you need (use scrape or batch_scrape)
463+
- When you already know which specific URL you need (use scrape)
532464
- When you need the content of the pages (use scrape after mapping)
533465

534466
**Common mistakes:**
@@ -554,7 +486,7 @@ Map a website to discover all indexed URLs on the site.
554486

555487
- Array of URLs found on the site
556488

557-
### 5. Search Tool (`firecrawl_search`)
489+
### 3. Search Tool (`firecrawl_search`)
558490

559491
Search the web and optionally extract content from search results.
560492

@@ -599,7 +531,7 @@ Search the web and optionally extract content from search results.
599531

600532
> "Find the latest research papers on AI published in 2023."
601533
602-
### 5b. Search Feedback Tool (`firecrawl_search_feedback`)
534+
### 3b. Search Feedback Tool (`firecrawl_search_feedback`)
603535

604536
Sends structured feedback on a previous `firecrawl_search` result. The first feedback per search id refunds 1 credit and improves Firecrawl's search quality. Idempotent per search id.
605537

@@ -641,7 +573,7 @@ Sends structured feedback on a previous `firecrawl_search` result. The first fee
641573

642574
- `{ success, feedbackId, creditsRefunded, alreadySubmitted? }` JSON.
643575

644-
### 5c. Generic Feedback Tool (`firecrawl_feedback`)
576+
### 3c. Generic Feedback Tool (`firecrawl_feedback`)
645577

646578
Sends structured feedback for a completed v2 endpoint job through `/v2/feedback`.
647579
Use this for endpoint-level feedback on `scrape`, `parse`, `map`, or `search`
@@ -678,7 +610,7 @@ and small metadata objects. Do not include raw scrape/parse outputs.
678610

679611
- `{ success, feedbackId, creditsRefunded, creditsRefundedToday?, dailyRefundCap?, dailyCapReached?, alreadySubmitted?, warning? }` JSON.
680612

681-
### 6. Crawl Tool (`firecrawl_crawl`)
613+
### 4. Crawl Tool (`firecrawl_crawl`)
682614

683615
Starts an asynchronous crawl job on a website and extract content from all pages.
684616

@@ -689,10 +621,10 @@ Starts an asynchronous crawl job on a website and extract content from all pages
689621
**Not recommended for:**
690622

691623
- Extracting content from a single page (use scrape)
692-
- When token limits are a concern (use map + batch_scrape)
624+
- When token limits are a concern (use map + scrape for tighter control)
693625
- When you need fast results (crawling can be slow)
694626

695-
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
627+
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + scrape for tighter control.
696628

697629
**Common mistakes:**
698630

@@ -734,7 +666,7 @@ Starts an asynchronous crawl job on a website and extract content from all pages
734666
}
735667
```
736668

737-
### 7. Check Crawl Status (`firecrawl_check_crawl_status`)
669+
### 5. Check Crawl Status (`firecrawl_check_crawl_status`)
738670

739671
Check the status of a crawl job.
740672

@@ -751,7 +683,7 @@ Check the status of a crawl job.
751683

752684
- Response includes the status of the crawl job:
753685

754-
### 8. Extract Tool (`firecrawl_extract`)
686+
### 6. Extract Tool (`firecrawl_extract`)
755687

756688
Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
757689

@@ -824,7 +756,7 @@ When using a self-hosted instance, the extraction will use your configured LLM.
824756
}
825757
```
826758

827-
### 9. Agent Tool (`firecrawl_agent`)
759+
### 7. Agent Tool (`firecrawl_agent`)
828760

829761
Autonomous web research agent. This is a separate AI agent layer that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query.
830762

@@ -905,7 +837,7 @@ Then poll with `firecrawl_agent_status` using the returned job ID.
905837

906838
- Job ID for status checking. Use `firecrawl_agent_status` to poll for results.
907839

908-
### 10. Check Agent Status (`firecrawl_agent_status`)
840+
### 8. Check Agent Status (`firecrawl_agent_status`)
909841

910842
Check the status of an agent job and retrieve results when complete. Use this to poll for results after starting an agent.
911843

@@ -926,7 +858,7 @@ Check the status of an agent job and retrieve results when complete. Use this to
926858
- `completed`: Research finished - response includes the extracted data
927859
- `failed`: An error occurred
928860

929-
### 11. Monitor Tools (`firecrawl_monitor_*`)
861+
### 9. Monitor Tools (`firecrawl_monitor_*`)
930862

931863
Create and manage recurring page monitors. Monitors run scheduled scrapes or crawls, diff each result against the last retained snapshot, and can notify by webhook or email.
932864

@@ -1009,7 +941,6 @@ Example log messages:
1009941
```
1010942
[INFO] Firecrawl MCP Server initialized successfully
1011943
[INFO] Starting scrape for URL: https://example.com
1012-
[INFO] Batch operation queued with ID: batch_1
1013944
[WARNING] Credit usage has reached warning threshold
1014945
[ERROR] Rate limit exceeded, retrying in 2s...
1015946
```

src/index.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1832,8 +1832,8 @@ server.addTool({
18321832
Starts a crawl job on a website and extracts content from all pages.
18331833
18341834
**Best for:** Extracting content from multiple related pages, when you need comprehensive coverage.
1835-
**Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + batch_scrape); when you need fast results (crawling can be slow).
1836-
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control.
1835+
**Not recommended for:** Extracting content from a single page (use scrape); when token limits are a concern (use map + scrape for tighter control); when you need fast results (crawling can be slow).
1836+
**Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + scrape for tighter control.
18371837
**Common mistakes:** Setting limit or maxDiscoveryDepth too high (causes token overflow) or too low (causes missing pages); using crawl for a single page (use scrape instead). Using a /* wildcard is not recommended.
18381838
**Prompt Example:** "Get all blog posts from the first two levels of example.com/blog."
18391839
**Usage Example:**

0 commit comments

Comments
 (0)