OpenClaw Web Search: The Automatic Research System Explained

Key Takeaways

Brave Search is the best default — generous free tier, clean results, and no tracking. Start here unless you have a specific reason to use another provider.

Tavily returns AI-optimized result summaries that reduce the need for a separate firecrawl content extraction step — faster for shallow research tasks.

SearXNG is the only zero-cost, self-hosted option — ideal for privacy-sensitive workflows or high-volume search that would strain API quotas.

Citations don't happen automatically — you must explicitly instruct the agent to record and reference source URLs in every output it produces.

Query design is the highest-leverage input in any search-enabled pipeline. Three well-constructed queries outperform ten generic ones on every research task.

The search provider decision gets made once and then ignored — most builders pick the first one that works and never revisit it. That's a mistake. The provider shapes what sources the agent finds, how much processing is needed after retrieval, and what your monthly API costs look like at scale. Get this decision right at the start.

Search Provider Comparison

OpenClaw supports four search providers, each with different trade-offs across coverage, cost, privacy, and result format. Here's the full comparison:

Provider	Free Tier	Result Format	Best For	Privacy
Brave Search	2,000 calls/month	URLs + snippets	General research, default setup	High — no tracking
Tavily	1,000 calls/month	AI summaries + URLs	Fast shallow research	Medium
SearXNG	Unlimited (self-hosted)	URLs + snippets	Privacy-sensitive, high-volume	Maximum — self-hosted
Perplexity	Limited trial	Cited answers + sources	Answer synthesis with citations	Low — data used for training

Brave is the right default for 80% of use cases. It indexes the open web without tracking, returns clean structured results, and the free tier covers 2,000 queries per month — enough for active research agents running multiple tasks daily.

Tavily's AI-optimized summaries sound appealing but come with a trade-off: the summaries are pre-processed by Tavily's models, which introduces an additional layer between the source and your agent. For deep research where you need the original source content, Brave plus firecrawl gives you cleaner access to the actual material.

💡

SearXNG for high-volume workflows

If your agent runs more than 50 search queries per day, self-hosted SearXNG eliminates API costs entirely. The Docker deployment takes 15 minutes and the instance handles hundreds of queries per day without rate limiting. Point OpenClaw at your instance URL and you're done.

Configuring Each Search Provider

Each provider uses the same search_config block in your CLAUDE.md, with provider-specific settings.

# Brave Search configuration
skills:
  - web_search
  - firecrawl  # pair with Brave for full content extraction

search_config:
  provider: brave
  api_key: "{{BRAVE_API_KEY}}"   # from api.search.brave.com
  results_per_query: 5           # 3-10 recommended
  country: "us"                  # localize results

---

# Tavily configuration
search_config:
  provider: tavily
  api_key: "{{TAVILY_API_KEY}}"  # from app.tavily.com
  search_depth: "advanced"       # basic or advanced
  include_answer: true           # returns synthesized answer
  max_results: 5

---

# SearXNG configuration (self-hosted)
search_config:
  provider: searxng
  base_url: "http://localhost:8080"  # your instance URL
  engines: ["google", "bing", "duckduckgo"]
  language: "en"

---

# Perplexity configuration
search_config:
  provider: perplexity
  api_key: "{{PERPLEXITY_API_KEY}}"  # from perplexity.ai/api
  model: "pplx-7b-online"            # or pplx-70b-online
  return_citations: true

We'll get to result parsing in a moment — but the results_per_query setting deserves attention now. More results per query isn't always better. At 10 results, the agent spends more tool calls processing low-quality results at the bottom of the list. Three to five high-quality results per query consistently outperforms ten results of mixed quality.

Building a Search-Enabled Agent

A search-enabled agent needs three things beyond the provider configuration: a query generation strategy, a result processing step, and an output assembly instruction. Most builders configure the provider and skip the other three — which is why their search agents produce shallow, inconsistently cited outputs.

# Complete search-enabled research agent
system: |
  You are a research agent. Task: {{RESEARCH_TOPIC}}

  QUERY STRATEGY:
  Generate queries from three angles:
  1. Direct: "{{TOPIC}} [specific aspect]"
  2. Adjacent: "{{TOPIC}} compared to [alternatives]"
  3. Critical: "{{TOPIC}} problems limitations criticism"
  Run each angle with 1-2 queries before synthesizing.

  RESULT PROCESSING:
  For each search result:
  - Note the source URL and publication date
  - Extract the specific claim or data point relevant to your task
  - Flag any result older than 18 months as potentially outdated
  - Skip results behind paywalls (mark as "PAYWALLED: [URL]")

  OUTPUT ASSEMBLY:
  Write findings organized by angle (direct, adjacent, critical).
  Every factual claim: include source URL in brackets.
  End with: numbered source list, publication date per source.

  LIMITS:
  Maximum 15 search queries per task.
  Minimum 8 unique sources in final output.

skills:
  - web_search
  - firecrawl
  - file_write

search_config:
  provider: brave
  api_key: "{{BRAVE_API_KEY}}"
  results_per_query: 5

⚠️

Query limits prevent runaway API costs

Without a maximum query count, an overly thorough research agent can run 50+ queries on a complex topic — exhausting your monthly free tier in a single run. Set a limit of 10-20 queries per task and monitor actual query counts during the first week of operation.

Result Parsing Patterns

Raw search results from Brave and SearXNG return URLs, titles, and snippets. The snippet is a short extract from the page — useful for relevance assessment, not sufficient for research. The standard pattern is to assess relevance from the snippet and then pass the URL to firecrawl for full content extraction.

# Result processing pattern in system prompt
system: |
  For each search result returned:

  Step 1 — Relevance check: Read the title and snippet.
  If clearly irrelevant, skip. Do not extract content.

  Step 2 — Content extraction: For relevant results,
  use firecrawl to extract full page content.

  Step 3 — Data extraction: From the full content,
  extract only the specific information relevant to your
  research question. Discard the rest.

  Step 4 — Attribution: Record: source URL, publication date
  (if found), and the specific extracted data point.

  This process runs once per result. Never re-extract.

The relevance check step is critical for cost control. Processing every result through firecrawl regardless of relevance wastes both time and API calls. A quick snippet assessment before content extraction cuts unnecessary extraction by 30–50% on most research tasks.

Citation Patterns for Search-Enabled Agents

Citations don't happen automatically. The model knows what it read, but it won't format source references in your output unless you explicitly instruct it to. This is the most common complaint about AI research outputs — the information looks right but there's no way to verify it.

Two citation patterns work reliably with OpenClaw:

Inline URL citations. After every factual claim, the agent includes the source URL in brackets. Simple, immediate, verifiable. Best for prose outputs where you want readers to follow through to source material.

# Inline citation instruction
system: |
  Citation rule: After every factual claim, include the source
  URL in brackets like this: "The market grew 34% YoY [https://source.com/article]"
  Never state a fact without a bracket citation.
  If you cannot find a source for a claim, do not include the claim.

Numbered footnote citations. Claims reference a number, sources listed at the end. Better for formal outputs, academic-style writing, or when inline URLs would disrupt readability.

# Footnote citation instruction
system: |
  As you write, maintain a numbered source list.
  When referencing a source, use superscript notation: [1], [2], etc.
  At the end of your output, include:

  ## Sources
  [1] Title — URL — Date
  [2] Title — URL — Date
  (continue for all sources used)

The "never state a fact without a citation" rule is the most important. Without it, the agent blends cited information with model knowledge, producing outputs where some claims are verifiable and some are hallucinations with no distinguishing marker.

Common Web Search Configuration Mistakes

No maximum query count. Uncapped research agents exhaust free tier quotas in single runs on complex topics. Set a limit of 10-20 queries per task from day one.
No citation instruction in the system prompt. Without explicit citation instructions, the agent produces unverifiable prose. Every research agent system prompt needs a citation rule.
Processing every result through firecrawl. Relevance-checking snippets first cuts unnecessary content extraction by 30-50%. Add a relevance gate before firecrawl calls.
Using Tavily for deep content research. Tavily's pre-summarized results lose original source detail. Use Brave or SearXNG plus firecrawl when you need full source content.
Hardcoding API keys in config files. Search API keys committed to version control get rotated and invalidate pipelines. Always use environment variables.
Generic queries without angle strategy. "What is [topic]" produces Wikipedia-level results. Structure queries across direct, adjacent, and critical angles to surface the full picture.

Frequently Asked Questions

Which search provider should I use with OpenClaw?

Brave Search is the best default — it has a generous free tier, good coverage, and returns clean structured results. Use Tavily if your use case requires AI-optimized result summaries out of the box. Use SearXNG if you need a self-hosted, privacy-preserving option with no API costs. Use Perplexity if you need search with built-in citation and answer synthesis.

How do I configure web search in OpenClaw?

Add 'web_search' to your skills list in CLAUDE.md and set your chosen provider plus API key in the search_config block. Brave and Tavily require API keys (both have free tiers). SearXNG requires a self-hosted instance URL. Perplexity uses the pplx-api endpoint with an API key from their developer portal.

Can OpenClaw search without an API key?

Yes, via self-hosted SearXNG. Deploy a SearXNG instance (Docker image available), point OpenClaw's search config at your instance URL, and no external API key is needed. This is the recommended approach for privacy-sensitive workflows or high-volume search that would incur significant API costs.

How does OpenClaw handle search result citations?

OpenClaw doesn't add citations automatically — you must instruct the agent to include them. Add citation instructions to your system prompt: 'For every factual claim, include the source URL in brackets after the claim.' For structured outputs, instruct the agent to maintain a numbered source list and reference by number throughout the document.

What is the difference between Brave and Tavily for OpenClaw search?

Brave returns standard search results — URLs, titles, snippets — that OpenClaw then processes through firecrawl or the browser skill for full content. Tavily returns pre-processed, AI-optimized summaries of results, which reduces the need for a separate content extraction step. Tavily is faster for shallow research; Brave plus firecrawl is better for deep content extraction.

How many search queries does OpenClaw use per research task?

A well-configured research agent runs 5-15 search queries per task — initial broad queries, follow-up specific queries, and gap-filling queries identified during synthesis. Each query costs one API call to your search provider. On Brave's free tier (2,000 calls/month), a 10-query research task costs 0.5% of your monthly quota.

You now have everything needed to configure any of the four supported search providers, build a properly structured search-enabled agent, parse results efficiently, and produce outputs with verifiable citations. Start with Brave — get your API key, add the config block, run your first research task, and verify the citation output matches your system prompt instructions. The entire setup takes under 30 minutes. Everything after that is refinement.

R. Nakamura

Developer Advocate

R. Nakamura has configured and benchmarked all four OpenClaw search providers across research, monitoring, and content pipeline use cases. Maintains a public comparison of search provider performance across different research task types for the OpenClaw community.